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Abstract 

We consider three problems in machine learning: 

• concept learning in the PAC model 

• mobile robot environment learning 

• learning-based approaches to protein structure prediction 

In the PAC frameworkTwe give an efficient algorithm for learning any function on k terms by 
general DNF. On the other handrwe show that in a well-studied restriction of the PAC model 
where the learner is not allowed to use a more expressive hypothesis (such as general DNF)T 
learning most symmetric functions on k terms is NP-hard. 

In the area of mobile robot environment learningrwe introduce the problem of piecemeal learn- 
ing an unknown environment. The robot must learn a complete map of its environment T while 
satisfying the constraint that periodically it has to return to its starting position (for refuelingr 
say). For environments that can be modeled as grid graphs with rectangular obstaclesFwe 
give two piecemeal learning algorithms in which the robot traverses a linear number of edges. 
For more general environments that can be modeled as arbitrary undirected graphsFwe give a 
nearly linear algorithm. 

The final part of the thesis applies machine learning to the problem of protein structure predic- 
tion. Most approaches to predicting local 3D structuresFor motifsT&re tailored towards motifs 
that are already well-studied by biologists. We give a learning algorithm that is particularly 
effective in situations where large numbers of examples of the motif are not known. These are 
precisely the situations that pose significant difficulties for previously known methods. We have 
implemented our algorithm and we demonstrate its performance on the coiled coil motif. 

Thesis Supervisor: Ronald L. RivestTProfessor of Computer Science 
Thesis Supervisor: Bonnie A. BergerFAssistant Professor of Mathematics 
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Chapter 1 



Introduction 



There are many reasons we want machinesTor computersTto learn. A machine that can learn is 
able to use its experience to help itself in the future. Such a machine can improve its performance 
on some task after performing the task several times. This is useful for computer scientists! 1 
since it means we do not have to consider all the possible scenarios a machine might encounter. 
Such a machine is able to adapt to various conditions or environments! 1 or even to changing 
environments. A machine that is able to learn can also help push science forward. It may be 
able to speed up the learning process for humanslbr it may be able discern patterns or do things 
which humans are incapable of doing. For exampleTwe may want to build a machine that can 
learn patterns that aid in medical diagnosisTor that may be able to learn how to understand 
and process speech. Or we might want to build an autonomous robot that can learn to walk 
through difficult or unexpected terrainTor that can learn a map of its environment. This robot 
could then be used to explore environments that are too dangerous for humansFsuch as the 
surface of other planets. 

In this thesisFwe study three particular problems in machine learning. In order to study 
any machine learning problemFwe must first specify the model of learning we are interested 
in. Fhere are many different possible modelsFand a model should be chosen according to the 
learning application we are interested in. Once we have specified the model we are looking atT 
we can give algorithms and show results within the model. There are several things which any 
"model of learning" must specify [69r72r44]: 

11 
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1. Learner: Who is doing the learning? In this thesisFwe consider the learner to be a 
machiner such as a computer or a robot. Sometimes the machine is assumed to have 
limited computational power (e.g.Tthe machine is a finite automaton)rbut in this thesis 
we assume that the machine is as powerful as a Turing machine. 

2. Domain: What is being learned? One of the most well-studied types of learning is 
concept learning where the learner is trying to come up with a "rule" to separate positive 
examples from negative examples. For examplerthe learner may be trying to distinguish 
chairs from things which are not chairs. There are many other types of things that can 
be learnedrsuch as an unknown environment (e.g.Ta new city) or an unknown technique 
(e.g.Thow to drive). 

3. Prior Knowledge: What does the learner know about the domain initially? This gen- 
erally restricts the learner's uncertainty and/or biases and expectations about unknown 
domains. This tells what the learner knows about what is possible or probable in the 
domain. For examplerthe learner may know that the unknown concept is representable 
in a certain way. That isrthe unknown concept might be known to be representable as a 
disjunction of featuresFor as a graph. 

4. Information Source: How is the learner informed about the domain? The learner may 
be given labeled examples. For instancerthe learner may be given examples of things 
which are chairsr and examples of things which are not chairs. The learner may get 
information about a domain by asking questions of a teacher (e.gr"Is a stool a chair?"). 
The learner may get information about its domain by actively experimenting with it (e.gr 
it may learn a map of a new city by walking around in it). 

5. Performance Criteria: How do we know whetherFor how wehTthe learner has learned? 
Different performance criteria include accuracy and efficiency. For accuracyTthe learner 
may be evaluated by its error raterits correctness of descriptionFor the number of mis- 
takes it made during learning. For efficiencyrthe learner may be evaluated on the amount 
of computation it does and the amount of information it needs (e.g.Tthe number of exam- 
ples it needs). In additionrthe learner may be required to have a particular hypothesis 
representation of an unknown conceptTor it may only need to have predictive output (i.e.T 
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the learner does not need a representation of the unknown conceptTjust a way to label 
new instances as either positive or negative). 

Different applications require different models of machine learning. In this thesisFwe con- 
sider three models of machine learning. The first part of the thesis studies a theoretical model 
of concept learning. For this modeirwe study learnability and give an efficient algorithm for 
learning a family of concept classes. The second part of the thesis studies mobile robot naviga- 
tion and environment learning. We introduce a model of explorationr which we call piecemeal 
learningT&nd give efficient algorithms for piecemeal learning unknown environments. The fi- 
nal part of the thesis applies machine learning to the problem of protein structure prediction. 
We introduce a learning technique that helps gather information on protein structures that 
biologists are interested inrbut do not know much about yet. 

We now give a more detailed summary of this thesisFand outline some of the contributions 
of this thesis to machine learningFmobile robot navigationFand protein structure prediction. 

Concept learning in the PAC framework 

Much of the machine learning literature has been devoted to the problem of concept learning. 
We study concept learning in the Probably Approximately Correct (PAC) framework [74]. The 
object of a PAC learning algorithm is to approximately infer an unknown concept that belongs 
to some known concept class. For our purposesrit suffices to view the problem as finding 
a concept consistent with a given set of labeled examples. Figure 1.1 shows the information 
presented to the learner at the start of learningr and what the learner must produce in order 
to learn. The examples are assumed to be a "representative sample" of future examples the 
learner might see. Performance is measured by the number of examples used for learningr 
the time- complexity of the learning algorithmFand the accuracy of the learned concept. We 
consider two standard versions of the PAC model: in onerthe learner is required to produce as 
output a hypothesis belonging to the same class as the concept to be learnedrand in the otherr 
the learner's hypothesis can be any polynomial-time algorithm. 

For this modeir we study the problem of learning the concept classes of functions on k 
terms. Concept classes that can be represented by functions on k terms include A;-term DNF 
(disjunctive normal form formulae with at most k terms)r£;-term exclusive-orFand r-of-A;-term 
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threshold functions. We give an efficient algorithm for PAC-learning any function on k terms 
by general DNF. We also show that for most symmetric functions on k termsrif the learner 
is required to output a hypothesis of the same concept classrthen learning is NP-complete. 
Thusrour results illustrate the importance of hypothesis representation. In particularrfor most 
concept classes of symmetric functions on k termsriearning the concept by itself is hardrbut 
learning it by general DNF is easy. 




oncept 




(a) 



(b) 



Figure 1.1: Concept learning with labeled examples, (a) Initiallyr the learner is given a set 
of labeled examples. The positive examples are denoted by +Tand the negative examples are 
denoted by — . (b) The goal of the learner is to find a concept consistent with these examples. 
That isT the learner wants to find a rule that differentiates the positive examples from the 
negative examples. 



Environment learning 

In the second part of this thesisFwe consider an active learning model where an autonomous 
robot must learn a map of its environment (see Figure 1.2). No examples are presented to 
the robot. Insteadrit learns about the environment through active experimentation: it walks 
around in the environment. We introduce the problem of piecemeal learning of an unknown 
environment. The robot's goal is to learn a complete map of its environmentTwhile satisfying 
the constraint that it must return every so often to its starting position. The piecemeal con- 
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straint models situations in which the robot must learn "a piece at a time." Unlike previous 
environment learning workTour work does not assume that the robot has sufficient resources to 
complete its learning task in one continuous phase; this is often an unrealistic assumptionras 
robots have limited power. After some explorationrthe robot may need to recharge or refuel. 
Orrthe robot may be exploring a dangerous environment T and after some time it may need to 
"cool down" or get maintenance. Orrthe robot might have some other task to performFand 
the piecemeal constraint enables "learning on the job." 

The environment is modeled as an arbitraryTundirected graphrwhich is initially unknown 
to the robot. The learner's performance is measured by the number of edges it traverses while 
exploring. For environments that can be modeled as grid graphs with rectangular obstaclesr 
we give two piecemeal learning algorithms in which the robot explores every vertex and edge 
in the graph by traversing a linear number of edges. For more general environments that can 
be modeled by an undirected graphrwe give a piecemeal learning algorithm in which the robot 
traverses at most a nearly linear number of edges. 





(a) (b) 

Figure 1.2: Environment learning, (a) Initially the learner only knows its starting location. 
(b) The learner must build a map of its environment. 



Learning-based methods for protein structure prediction 

In the last part of this thesisFwe again turn to concept learningrbut here the learner is given 
both labeled and unlabeled examples (see Figure 1.3). Unlike the previous concept learning 
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modeirhere the labeled examples that the learner is given are not representative of the examples 
that the learner will see; moreover rthe learner knows that this is the case. Unlike the other work 
in this thesisrthe performance measure we use here is empirical and not theoretical. Within 
this modeirwe look at the particular application of protein structure prediction. 





(b) 



Figure 1.3: Concept learning with labeled and unlabeled examples, (a) The learner is given a 
set of labeled examples as well as a set of unlabeled examples. The positive examples are denoted 
by +Tthe negative examples are denoted by — Tand the unlabelled examples are denoted by 
?. (b) The learner must find a concept which partitions these examples. The unlabeled points 
within the circle are assumed positiverand the unlabeled points outside of the circle are assumed 
negative. 



The goal of this work is to use computational techniques to learn about protein structures 
or folds which biologists do not yet know much about. Current techniques for predicting local 
three-dimensional structureslbr motifsT&re tailored towards folds which are already well-studied 
and documented by biologists. We give a learning algorithm that is particularly effective in 
situations where this is not the case. We generalize the 2-stranded coiled coil domain to learn 3- 
stranded coiled coilsFand perhaps other similar motifs. As a consequence of this workrwe have 
identified many new sequences that we believe contain coiled coil and coiled- coil- like structures. 
These sequences contain regions that are not identified by the best previous computational 
methodrbut are identified by our method. These sequences include mouse hepatitis virusf 
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human rotavirusrhuman T-cell lymphotropic virusrHuman Immunodeficiency Virus (HIV) and 
Simian Immunodeficiency Virus (SIV). IndependentlyTrecent laboratory work has predicted the 
existence of a coiled- coil- like structure in HIV and SIV [19r56]Tand our algorithm is able to 
predict the regions of this structure to within a few residues. We hope that biologists will direct 
their laboratory efforts towards testing other new candidate sequences which we identify. 

Organization of thesis 

The thesis is organized in three self-contained chapters. In Chapter 2rwe study the problem of 
learning concept classes of functions on k terms in the PAC framework. In Chapter 3IVe intro- 
duce the problem of piecemeal learning unknown environments! 1 and give efficient algorithms 
for this problem. In Chapter 4rwe study the problem of learning protein motifs. Finallyrin 
Chapter 5rwe finish with some concluding remarks. 



Chapter 2 



Learning functions on k terms 



2.1 Introduction 

Since its introductionr Valiant's distribution-free or PAC learning framework [74] has been a 
well-studied model of concept learning. In this frameworkTthe object of a learning algorithm is 
to approximately infer an unknown target concept that belongs to some known concept class. 
The learner is given examples chosen randomly according to a fixed but unknown distribution. 
The goal of the learner is to find (with high probability) a hypothesis that accurately predicts 
new instances as positive or negative examples of the concept. We consider here two standard 
versions of this model: in oneTthe learner is required to produce as output a hypothesis be- 
longing to the same class as the target conceptTand in the otherTthe learner's hypotheses may 
be any polynomial-time algorithm [64] [50] [66]. Several examples are known of concept classes 
that are hard to learn when hypotheses are restricted to belong to the same class as the target 
concept but easy to learn when they may belong to a larger class. In particularT Pitt and 
Valiant [64] showed that learning the class of A;-term DNF formulas (that isTfunctions that can 
be represented by a disjunction of k monomials) is NP-hard if the learner is required to produce 
a A;-term DNF formularbut is easy if the learner may use a representation of £;-CNF formulas. 
In this chapterTwe show that this phenomenon occurs for a broad class of formulas. In par- 
ticularrgiven constant k and function /Ilet C k j be the class of concepts of the form /(Ti, . . . , T k ) 
where T l7 . . . , T k are monomials. SoTfor exampleTif / is the OR function then C k j is the class of 

19 
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£;-term DNF formulas. We show that for any symmetric function / (that isTf depends on only 
the number of inputs which are l)riearning the class Ckj by hypothesis class Ckj is NP-hard 
except for / £ {A,->A,T,F}. The hardness result completely characterizes the complexity of 
learning Ckj by Ckj for symmetric functions /. For / £ {T, FjTrearning Ckj is triviairand for 
/ £ {A,-iA]T Ckj is the class of conjunctions or disjunctions respectivelyrso learning Ckj by 
Ckj is easy by a standard procedure for learning monomials. 

On the other handrwe also present a polynomial-time algorithm that learns the class of Ck 
of all concepts /(Ti, . . . , T fc )rwhere / is any {0, l}-valued function of k inputs and T l7 . . . , T k 
are monomialsFusing a hypothesis class of general DNF. As a consequencerthis algorithm will 
learn by DNF the concept classes Ckj for which learning Ckj by Ckj is NP-hard. 

A strategy for learning the special case of A;-term DNF formulas is to learn by the hypothesis 
class of A;-CNF (that isr conjunctions of disjunctions of size k). Every A;-term DNF can be 
written as a A;-CNF (since we can "distribute out" the A;-term DNF) and A;-CNF can be easily 
learned by standard procedures. Supposerhoweverrthat we wish to learn in the same manner 
another class of concepts Ckj (that isFother than A;-term DNF) for which learning Ckj by Ckj 
is NP-hard. Our results and related results by Fischer and Simon [41] show that exclusive-or 
(XOR) is one such function. In this caseFan XOR of k monomials need not be representable 
as a A;-CNF or as a A;-DNF (for exampler^^ © x 3 written as a DNF requires one term of size 
3rand written as a CNF requires one clause of size 3). In addition an XOR of k monomials 
need not have representation as a conjunction of XORs of size k. Thusrthe standard strategy 
for learning A;-term DNF or A;-term CNF will not work for learning A;-term XOR. 

Insteadrour algorithm is based on a different strategy. RoughlyFwe use the fact that a 
monomial can be made false just by setting one of the literals that appears in it to 0. Sor 
given a concept represented by a function on k unknown terms T l7 . . .,T k Tif we are able to 
"guess" literals that appear in k — 1 of the monomials and consider only examples in which 
these monomials are falseFwe can then focus on the term remaining. ThenFonce we have been 
able to classify the examples that satisfy only one term of T l7 . . .,T k Twe can focus on those 
that satisfy pairs of termsFand so on. 
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2.2 Notation and definitions 

We will consider learning over the Boolean domain X n = {0, l} n . An example is an element 
v £ {0, l} n and a concept c is a boolean function on examples. A concept class is a collection 
of concepts. For a given a target concept cT& labeled example for c is a pair (v,c(v)) where v 
is a positive example if c(-y) = 1 and a negative example if c(-y) = 0. For convenienceFwe will 
at times think of an example as a collection of variables or attributes x. In this caserfor an 
example v and variable x £ X n riet v(x) = 1 if the bit of v corresponding to x is IT and 
otherwise. AlsoFwe will use \c\ to denote the size of concept c under some reasonable encoding. 

Let A; be a constant. Define the concept class C k to be the set of ah concepts /(Ti, . . .,T k ) 
where Ti,...,T k are monomials (conjunctions of literals) and / is any {0, l}-function on k 
boolean inputs. For exampleFclass C 2 includes the concept Xi~x 2 (Bx3X4 : x 5 TwRere "©" denotes the 
XOR function. For a given function fTlet C k j be those concepts in C k of the form /(Ti, . . . , T k ) 
for the given /. We say that a function / is symmetric if the value of / depends only on the 
number of inputs that are 1. For a symmetric function / and integer irwe let f(i) denote the 
value of / when exactly i of its inputs are 1. 

We study learning in the distribution-free or Probably Approximately Correct (PAC) learn- 
ing model [74T2]. In the PAC learning modeirwe assume that the learning algorithm has 
available an oracle EXAMPLES(c) that when queriedr produces a labeled example (v,c(v)) 
according to a fixed but unknown probability distribution D. If C and H are concept classesr 
we say that algorithm A learns C by H if for some polynomial pTioi all target concepts c £ CT 
distributions DT and error parameters e and 8: algorithm A halts in time p(n, ^,|,|c|) and 
outputs a hypothesis h £ H that with probability at least 1 — 8 has error at most e. The error 
of a hypothesis h is the probability that h(v) ^ c(v) when v is chosen from the distribution D. 

For the purposes of our positive resultsrit will be enough to consider the following sufficient 
condition for learnability [26]. An algorithm A is an "Occam algorithm" for C if on any sample 
(collection of labeled examples) of size m consistent with some c £ CTalgorithm A produces a 
consistent hypothesis of size at most Ic^ra" for constants a < l,/3 > 1. Blumer et al. show 
that any Occam algorithm for CTproducing hypotheses from if Twill learn C by if . 
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2.3 The learning algorithm 

In this sectionrwe present an algorithm that learns the class C k by the hypothesis class of 
general DNF. To illustrate the strategy usedriet us consider first the problem of learning an 
XOR of two monotone monomials. 

Suppose the target concept is c = T\ © T 2 for monotone monomials T\ and T 2 . We know 
each positive example v satisfies one of T\ or T 2 and fails to satisfy the otherFand so has some 
Vi = for Xi in exactly one of T\ and T 2 . Given a set S of examplesriet ^Tfor 1 < i < nT 
be the set of those examples v for which f, = 0. If a variable x, is contained in exactly one 
of {Ti,r 2 }rsay x, is in TiTthen the monomial af, A T 2 is satisfied by every positive example in 
Si and no negative example in S . ThereforeFwe can actually find a monomial consistent with 
the positive examples in this Si and the negative examples in 5Tusing the standard monomial 
learning procedure. 

SoFwe can learn an XOR of two terms as follows. For each variable SjTfind a monomial 
Mi consistent with positive examples in Si and with all negative examplesrif such a monomial 
exists. Fhenr output as hypothesis the disjunction of the M 8 's. Fhe hypothesis produced is 
consistent with every negative example since no negative example satisfies any M,. AlsoFsince 
every positive example lies in some Si for x, in exactly one of {T l7 T 2 }rfor each positive example 
we will have found some monomial it satisfies. 

We now present an Occam algorithm based on the above strategy that learns the class C k 
using a hypothesis class of DNF. Without loss of generalityFwe may assume that the target 
concept is some /(Ti, . . .,T k ) where the T; are monotone (we can think of non-monotone terms 
as monotone terms over the attribute space {xi, afi, x 2 , £ 2 , • • .,a; n ,af n }). The algorithm Fearn- 
A;-Term takes as input a set S of m examples consistent with some function /(Ti, . . . , T k ) on k 
monotone monomials and outputs a DNF of size 0(n k+1 ) consistent with the given examples. 

The basic idea of Fearn-A;-Term is as follows. In the first iterationrthe algorithm "handles" 
those positive examples that satisfy none of the terms. That isTif there are any such positive 
examplesrthe algorithm finds a set of monomials such that each of those positive examples 
satisfies one of the monomials. These monomials are then added to the DNF being built. In 
the second iterationrthe algorithm tries to find a set of monomials for those positive examples 
that satisfy exactly one of the terms. This process is continued so that at each iteration the 
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algorithm focuses on examples that satisfy an increasing number of terms. ThusFat each value 
of r in the looprthe algorithm finds terms to handle all the positive examples that do not satisfy 
exactly r terms of the target concept. The ordering of r = k down to is important to ensure 
that needed terms are not thrown away in step 9. Note that in step 5rwe allow the ij to be 
the same. This is done for purposes of simpler analysis — the algorithm would still work if we 
just considered the ( n ) sets of r different variables. 



Learn- A;-Term(5') 

1 Let P = the positive examples in S 

2 Let N = the negative examples in S 

3 Initialize the DNF hypothesis h to {}. 

4 For r = k down to Do 

5 For each set of r variables: {x 8l , . . . , Xi r } Do 

6 Let M be the monomial x^ ■ ■ ■ Xi r . 

7 Let U be the set of those examples v = (^i, . . . , v n ) £ P 

such that Vi 1 = Vi 2 = . . . = Vi r = 0. That is, U is the set 
of examples in P satisfying the term M. 

8 Let T be the monomial that is the conjunction of all X{ 

such that every example v £ U has V{ = 1. (T is the most 
specific monotone monomial satisfied by all examples in U .) 

9 If no negative example in N satisfies term MT = x^x^ ■ ■ -Xi r T 

10 Then 

11 add MT as a term to the hypothesis h 

12 let P <- P - U. 



Algorithm Learn-A;-Term clearly runs in time polynomial in m and n*Tso we just need to 
prove the following theorem. 

Theorem 1 Algorithm Learn-A;-Term, on m examples consistent with some function f of k 
monotone monomials over {0, 1}", produces a consistent DNF hypothesis of size 0(n k+1 ). 

Proof: First notice the following facts. The DNF h produced by algorithm Fearn-A;-Term 
has at most n k + n k ~ 1 + . . . + n = 0(n k ) terms of size O(ra)rso the size of the hypothesis is at 
most 0(n k+1 ). Alsorthe hypothesis h is consistent with the set N of negative examplesFsince in 
step 9 any term that some negative example satisfies will never be included in the DNF. Thus 
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all we need to do is prove that for every positive example v G PLthere is some term added to 
h which is satisfied by v. 

Let /(Ti, . . . , T k ) be the target concept where T l7 . . . , T k are monotone monomials. Let Sj 
for j G {0, ...,&} be the set of those positive examples seen that satisfy exactly j of T l7 . . . , T k 
(if / is the XOR functionLfor instanceLthen the sets Sj for even values of j are all empty). We 
will argue by induction on the index j; in particular we will argue that after the iteration of the 
loop of Learn-A;-Term in which r = k — j'Lall positive examples v G Sj have been "captured" by 
(that isLthey satisfy) some term in h. 

j = 0, r = k: Let v be a positive example that satisfies none of T l7 . . . , T k . If such an example 
existsLthen any other example satisfying none of T l7 . . .,T k must also be a positive ex- 
ample. There must be some collection of variables x ix 6 2\, . . . , x ik G T k (not necessarily 
all different) such that f 8l = v i2 = . . . = v ik = OLor otherwise v would satisfy some term. 

Consider the iteration in which the monomial M is x il ■ ■ -Xi k . Example v satisfies M 
and so is put into U in step 7. Any other example satisfying M cannot satisfy any of 
Ti, . . .,T k (by definition of x il , . . .,x ik ) and therefore must be positive. SoLa term MT 
satisfied by v wih be added to h in step 4. 

j > 0,r= k — j: Let v be a positive example that satisfies exactly j of the terms T l7 . . . , T k ; for 
convenienceLassume v satisfies terms T r+1 , . . .,T k . Any other example satisfying exactly 
those terms and no others must also be positive. Let x ix £T lr .., x ir G T r be a collection 
of not necessarily distinct variables such that v it = . . . = v ir = 0. 

At the iteration in which the monomial M is x il ■ ■ -af^Lexample v is put into set U in step 
7 and the term T created is satisfied by v. In factLT also has in it all variables contained 
in the terms T r+1 , . . .,T k . The reason is as follows: 

Suppose Xi is contained in one of T r+1 , . . . , T k but not in T. ThenLthere must 
exist some positive example w G U such that w, = 0. SoL example w fails to 
satisfy at least one of T r+1 , . . . ,T k in addition to not satisfying any of T l7 . . . ,T r . 
ButLthis means that w satisfies fewer than j terms and so must already have 
been removed from P in an earlier iteration by our inductive hypothesis. (Note 
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that it is for this reason that algorithm Learn- A;- Term begins with r = k and 
works down to r = 0.) 

Sorany example satisfying MT must satisfy all of T r+1 , . . .,T k (since it satisfies T) and 
none of T l7 . . . ,T r (since it satisfies M) and therefore must be positive. Thusrterm MT 
will be added to h in step 9. 

Sorwe have shown that algorithm Learn- £;-Termr on any size input consistent with some 
function f of k monotone monomials over {0, l} n L produces a consistent hypothesis of size 
0(n k+1 ) in time polynomial in m and n k . I 

Corollary 1 The concept class C k is learnable by DNT in the distribution-free model. 

In factLif we assume without loss of generality that the target concept c = /(Ti, . . .,T k ) 
has the property that /(00 • • -0) = (otherwise we will learn c)Lthen we can start algorithm 
Learn-A;-Term at r = k — 1 and produce a DNF of only 0(n k ~ 1 ) terms instead of one of 0(n k ) 
terms. SoLfor exampleLwe can learn a A;-term DNF with a DNF hypothesis of 0(n k ~ 1 ) terms 
each of size 0(n). This differs from the standard procedure of learning A;-term DNFLwhich 
gives a A;-CNF of 0(n k ) clauses of size k = 0(1). MoreoverLif we know that / outputs when 
only a few of its inputs are lLthen we can produce a hypothesis of smaller size. For exampleL 
if / is the majority functionLthen we can start Learn-A;-Term with r = k/2 and get a DNF of 
only 0(n k l 2 ) terms. 

2.3.1 Decision lists 

An alternative way to learn C k is to learn by the class of A;-decision lists (k-DLs). 1 In factLthe 
proof for Algorithm Learn-k-Term can be modified to show any concept in C k can be written 
as a A;-decision list. In particularLlet c = /(Ti, . . . , T k ) be some concept in C k . The decision list 
will consist of rules of the form "if M, then 6 8 F where the each M, will correspond to one of 
the monomials M in algorithm Learn-A;-Term. 



A k-decision list is a function of the form: "if Mi then b\ , else if M2 then 62, else ... else if Af„ 
then bm else 6 m _|-i," where the M t are monomials of size at most k and the b t are each either or 1. 
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Let b be the value of c(x) when x satisfies none of T l7 . . . , T k . Put on the top of the decision 
list all rules of the from "if 'x il 'Xi 2 ■ ■ -x ik then b T where x ix 6 2\, . . . , x ik G T k . Let us say that 
a set of rules "captures" an example if the example satisfies the if-portion of one of them. 
ThusLwe have now captured all examples that satisfy none of the T; (and have classified them 
correctly). 

Inductively suppose we have created rules that capture (and correctly classify) all examples 
satisfying j — 1 or fewer of the k terms. Append onto the bottom of the decision list the 
following rules. For each subset {T tl , . . . ,T tk _ .} C {Ti, . . .,T k } such that all examples which 
satisfy exactly the j terms remaining are positiveLadd all rules of the form: "if af^af^ • • ■x ik _ . 
then IF where x il G T H , . . ■,x ik _ j G T tk _ r For each subset {T H , . . .,T tk _ j } C {T u . . .,T k } such 
that all examples satisfying exactly the j terms remaining are negativeFadd all rules of the 
form: "if x it x i2 ■ ■ ■x ik _ j then OF where x il G T H , . . . , x ik _ j G T tk _ r 

Finallyrthe default case of the decision list is the rule "else 6F where b is the classification 
of examples satisfying all the terms T;. It is clear from the above arguments that this A;-decision 
list is logically equivalent to the A;-term function. 

The mistake-bound model is a model of learning more stringent than the PAC model; herer 
unlabeled examples are presented to the learner in an arbitrary orderFand after each one the 
learner must predict its classification before being told the correct value. The learner is judged 
by the total number of mistakes it makes in such a sequence. Using the halving algorithm [54] T 
A;-decision lists can be learned in the mistake-bound model with 0{n k ) mistakes. Thus have 
the following theorem: 

Theorem 2 All functions on k terms can be learned in the mistake-bound model with 0(n k ) 
mistakes, using a representation of k-decision lists. 

In factTwe can learn A;-term functions in an "attribute-efficient" senseFwhere the number 
of mistakes is polynomial in the number of relevant variables (variables that appear in some 
term T;) and is only logarithmic in the number of irrelevant variables. This uses a result of 
Littlestone [54] as follows. 

An alternation in a decision list is a pair of adjacent rules such that the boolean classification 
values for the rules differ. By appropriately ordering the rules in the decision list construction 
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above (listing the "negative rules" before the "positive rules" on alternate j values) one can 
see that for any A;-term function there is a logically equivalent A;-decision list with at most k 
alternations. Such a decision list can be thought of as a function in the form: 

if (M M OR M 1>2 OR ... OR M hmi ) then b^else if (M 2 ,i OR M 2)2 OR ... OR 

M 2 , m2 ) then 6 2 relse . . . else if (M,_ M OR M k _ 1<2 OR ... OR M^^J then b k _ x 
else b k T 

where 6, = 1 — 6;_i. 

Decision lists with small numbers of alternations can be written as linear threshold functions 
over the monomials MjTwith not too large integral weights. For instancerif b k _i = lTk is oddr 
and m is the sum of the m 8 Tthe above decision list can be written as: 

(M k _ ltl + ... + M k _ 1>mk _ 1 ) - m(M k _ 2tl + ... + M k _ 2tmk _ a ) 

+ m 2 (M,_3,i + ... + M,_ 3 , m ,_ 3 ) 

- m*(M M + ... + M limi ) > 0. 

If only r variables are relevant to the A;-term functionrthen the number of rules m is at 
most r k . Thereforerthe maximum weight in the threshold function is r k . 

Littlestone [54] gives an algorithm that can be used to learn such a functionr where the 
number of mistakes is at most 0((mr k ) 2 log(n k )) = 0(kr 2k+2k logra). Thusrif the number r 
of relevant variables is smahTthis can be a savings in the number of mistakes made. Thus we 
have the following theorem: 

Theorem 3 Any function on k terms can be learned with 0(kr 2k+2k logra) mistakes, where r 
is the number of relevant variables. 

2.4 Hardness results 

In this sectionlwe show that learning the class C k j often requires allowing the learning algorithm 
a more expressive hypothesis class than C k j. In the previous sectionFwe gave an algorithm that 



28 Learning functions on k terms 



learns the concept class of functions on k terms using the hypothesis class of general DNF. On 
the other handrwe now show that when learning the class C^jYii the algorithm must produce 
a hypothesis from the class C^jTihe problem can become NP-hard. In particularFwe show 
that for any symmetric function /riearning the class Ckj by hypothesis class Ckj is NP-hard 
except for / £ {A,->A,T,F}. The hardness result completely characterizes the complexity of 
learning Ckj by Ckj for symmetric functions /. For / £ {T, _F}riearning Ckj is triviairand for 
/ £ {A,-iA}r Ckj is the class of conjunctions or disjunctions respectivelyFso learning Ckj by 
Ckj is easy by a standard procedure. We show the following: 

Theorem 4 For any symmetric function f on k inputs except for f £ {A,->A,T,F}, learning 
the class Ckj by Ckj is NP-hard. 

This theorem extends the work of Pitt and Valiant [64]rwhich shows that learning the class 
of A;-term DNF formulas is NP-hard if the learner is required to produce a A;-term DNF formula. 
Before giving the proof of Theorem 4Fwe first provide some intuition. For k > 3rthe proof 
of Pitt and Valiant is essentially a reduction from graph £;- color ability. 2 Their reduction is as 
follows. Given the graphrthey create a variable x, for each vertex f, £ V. They then create 
one positive examples for each vertex so that the example corresponding to vertex i has bit 
i set to and all other bits set to 1. They also create one negative example for each edge 
such that the example corresponding to edge (i,j) has bits i and j set to and the other 
bits set to 1. They then show that the set of examples is consistent with a disjunction of k 
terms if and only if G is fc-colorable. Their proof does not work for more general symmetric 
functions / of k terms. In particularFwhen / is a symmetric function other than OR (e.g.T 
when the concept class is 4-term exclusive-or formulas)rusing their reduction it is possible to 
find a formula /(Ti,T 2 , . . -,F k ) that correctly classifies all positive and negative examplesrbut 
the corresponding coloring is invalid. The basic problem is that unlike the case of disjunctionr 
for arbitrary /ras the number of inputs that are 1 increasesrthe value of / can switch back 
and forth between 1 and 0. To solve this problemFwe introduce enough variables and examples 
for each edge such that x, and Xj are forced to occur in different terms. We can use this 



The graph fc-colorability problem is: given a graph G = (V, E) and a positive integer k, does there exist a 
function / : V — > {1, 2, . . . , k} such that f(u) ^ f(v) whenever (u, v) G El That is, using at most k colors, is 
it possible to assign a color to each vertex in the graph such that for any edge, its vertices are given different 
colors? 
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technique to reduce graph £;- color ablity to learning any symmetric function on k terms (except 
A,-A,T,F). 

To show Theorem 4Twe first consider the concept class C™? n = {/(Ti, . . . ,T k )} where 
Ti,...,T k are monotone monomialsr and show that learning C™? n by C™? n is NP-hard. We 
then give an extension of the argument that shows that learning C™? n by C k j is NP-hard. This 
implies Theorem 4. 

Theorem 5 For any symmetric function f on k inputs except f £ {A,->A,T,F}, learning the 
class C^p by C^j" is NP-hard. 

Proof: First note that if k = 2 then the only functions / with / £" {A,-iA,T, F} are the 
functions {V, -iV, ©, ->&}■ The proof of [64] for 2-term DNF can be applied directly for these 
cases; soFwe assume that k > 3. Without loss of generalityTwe assume that f(k — 1) = 0; that 
isTf outputs when exactly k — 1 of its inputs are 1. OtherwiseFwe show that learning C™?" 
by C™p for /' = / is NP-hard and the result follows. 

The proof is a reduction from graph fc-colorability. Given a graph G = (V, E)Twe create 
labeled examples over n = \V\ + (k — 2)\E\ variables such that there exists c £ C™? n consistent 
with these examples if and only if there is a A;-coloring of the graph. We assume that G contains 
no isolated vertices since such vertices do not affect the coloring of the graph. 

We denote the n variables as follows. There is one variable x, for each vertex i £ T/Tand 
k — 2 variables wj •, vif , ■, . . . , w\~ 2 for each edge (i,j) £ E. Thusrfor each edge (i,j) £ ET 
we have a set W{j of k associated variables {x{, Xj, wj •, wf •, . . .,wf~ 2 }. We add the Wjj's so 
that ultimately any hypothesis consistent with the examples we define must contain x, and Xj 
in different terms if (i,j) £ E. For conveniencer we use the following notation to denote an 
example that consists of l's in all bits except those specified by a set of variables W. 

• For W a collection of variablesriet g(W) be the example v such that v(x) = for x £ W 
and v(x) = 1 for x £" W. Recall that v(x) is the bit of v corresponding to variable x. 

For / £ {1, . . ., k} and (i,j) £ ^riet S' itj = {g(W) : W C W iti , \W\ = I}. That isFset S' itj is the 
set of examples v = g(W) for W a subset of size / of the set {x{, Xj, wj •, wf , ■, . . . , w t ,J 2 }- We 
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now define k sets of examples as foiiows: 

S l = {Sly.(i,3)(LE}, 

s 2 = {S 2 y .(t,f)eE}, 
s k = {Sl r .(t,f)eE}, 

such that v G S l , 1 < / < kTis a positive example if and only if f(k — /) = 1. That isrfor each 
edge (i,j) G preach S 1 contains ( ; ) examples corresponding to that edge. Each v G S' has 
exactly / bits set to Orwhere the / variables corresponding to these bits are chosen from some 
set Wij. If / is true when exactly k — I terms are true (i.e. f(k — I) = l)Lthen we label all 
vectors is S l as positive examples; otherwise we label them as negative examples. For exampler 
if / is the XOR function and k is evenrthen all examples in S 1 , S 3 , . . . are labeled as positive 
and those in S 2 , S 4 , . . . are labeled as negative. 

We now show that there exist monotone terms T l7 T 2 , . . . , T k such that f(Ti,T 2 , . . . , T k ) is 
consistent with these examples if and only if there is a A;-coloring of the graph G. 

(^=) Given a A;-coloring of the graphrthen for each vertex i which is colored /rplace x, in term 
T\. Then for each edge (i,j)T variables x, and Xj appear in different terms. Now arbitrarily 
place the remaining k — 2 variables associated with this edge (the Wjj's) into the remaining k — 2 
terms such that each term receives exactly one variable. Thus for each edge (i, j)Teach of the 
associated variables {x{, Xj, wj •, w 2 , : , . . . , wfj 2 } occurs in a different term. So for any example 
in 5'rexactly / terms are false and k — I terms are true. Since the examples in S l are positive 
exactly when f(k — I) = ITthe concept /(Ti,T 2 , . . -,T k ) classifies all examples correctly. 

(=^) Suppose we have Ti,T 2 , . . .,T k such that concept c = /(Ti,T 2 , . . -,T k ) is consistent with 
all the examples. Now color the vertices by the function \ '■ V ~^ {1? 2, . . .,k} defined by \(i) = 
min {j: variable x, occurs in term Tj}. Lemma 1 guarantees we have a well defined functionr 
and Lemma 2 gives us a valid coloring. I 

Lemma 1 Each variable X; occurs in some term. 
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Proof: Suppose that some x, does not occur in any term. Let q = min {/ : f(k — /) = 1 and / > 
0}. That isLg is the smallest positive number of terms that can be false such that concept c is 
true. Note that q is the least index such that c(v) = 1 for v G S q . We know that q exists for 
/ G - {ANDrFALSE}. 

Pick j such that (i,j) G E (since we assumed that the graph is connectedrwe know some 
such j exists). Now consider the positive example v = g({xi, Xj, wj •, wj , : , . . . , w\~ }). If x, does 
not occur in any termrthen u = g({xj, wj •, wj , : , . . . , w\~ }) satisfies the same number of terms 
as ?IL and thus c(u) = c(v) = 1. But u belongs to 5' g_1 rand we know all examples in S 9_1 are 
negative examples by our definition of q (S q is our first set of positive examples). Contradiction. 



Lemma 2 If (i,j) G E then x, and Xj never occur in the same term. 

Proof: Suppose that for (i,j) G ET variables x, and Xj occur in the same term. Againriet 
us look at vectors in S 9 where q = min {/ : f(k — /) = 1 and / > 0}. In particularly consider 
the positive example v = g({xi, Xj, w] •, w'f , ■., . . . , w\~ }). By Lemma 3Lwe know that exactly q 
terms of c are not satisfied by v. Then we know that each of these q terms must contain at least 
one variable of {a;,, Xj, wj •, wfj, . . . , w\~ }. If x, and Xj occur in the same termLthen we know 
that some variable x G {x,, Xj, wj,, w'f , ■.,..., w\~ } occurs in at least two terms. Let r be the 
number of terms that variable x appears in. We build a set S of at most q — r + 1 variables such 
that u = g(S) also makes q terms false. Initially let S = {x}. Then for each of the remaining 
q — r terms not satisfied by iTLplace into S some variable from {a;,, Xj, wj •, wj , ■., . . . , w\~ } which 
appears in that term. Now consider example u. The terms not satisfied by u are exactly those 
not satisfied by vTso c(u) = c(v) = 1. MoreoverLsince S C {xi,Xj,wj -,wj •, . . . , w\~ } C WjjT 
example u must he in some set S l where / < q. But S q is our first set (the set of least index) 
of positive examplesLso u must be negative. Contradiction. I 

Lemma 3 Exactly q terms of c are not satisfied by v. 

Proof: Suppose not. That isLsuppose r ^ q terms of c are not satisfied by v. Since v is a 
positive exampleL/(A; — r) = 1 and by definition of q we have r > q. There are now two cases: 
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Case 1: f(k - I) = 1 for all / G {q, q + 1, . . . , r}. 

By definition of qTfor any set S C {x,, Xj, wj •, wfj, . . . , w\~ } of size g — IT c(g(S)) = 0. This 
implies that each u = g(S) satisfies at least r — q + 1 more terms of {T l7 . . .,T k } than does 
v. But this requires each variable in {xi,Xj,wjj,w'f ,, . . . ,w^~ } to appear without any other 
variable from this set in r — q + 1 terms. So there must exist q(r — q + 1) terms not satisfied by 
v. Since r > q and q ^ 1 (we know /(& — g) = 1 but /(& — 1) = 0)Twe have: 



r(q-l) > q(q-l) 



rq — r > q 2 — q 



q(r - q + 1) > r. 

Thusrmore than r terms are not satisfied by v. Contradiction. 

Case 2: f(k - /) = for some / G {q + 1, . . . , r - 1}. 
Consider the sequence of examples: 

v q = g({xi, Xj,wjj, wfj, ..., wf] 2 }), 
Vq+i = g{{x i ,Xj,w l ij ,w 2 j ,...,w q i ~ 1 }), 

v k = g({ Xi , x h w\^ w 2 tj , . . . , w*J 2 }). 

We assign values to ft,^-, and /, which maintain the following invariants: q, < /, < r, and 
f(k — qi) = f(k — Ti) and f(k — g 8 ) ^ f(k — /,). Initially let q± = qT r\ = 7Tand li = I. 
Initiallyrpositive example v qi fails to satisfy r\ terms and there exists ^ between q± and r\ with 
f(k — h) = 0. Thus negative example v tl must fail to satisfy some r 2 > r\ terms. Now let 
q 2 = l x and l 2 = ^Tand so we have f(k — q 2 ) = f(k — r 2 ) = 0Tf(k — l 2 ) = lTand q 2 < l 2 < r 2 . 
Thus we know that positive example v l2 must satisfy some r 3 > r 2 terms. Letting q 3 = l 2 and 
/ 3 = r 2 Tand continuing in this fashionFwe find an increasing sequence qi,q 2 ,q 3 , . . Tsuch that 
each example v qi fails to satisfy r, > q, terms. At q, = kTwe have a contradiction. I 
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We have now finished proving Theorem 5. We now extend the proof to the general case in 
which the terms T l7 . . . , T k may be non-monotone. 

Proof of Theorem 4: We show that C™? n by C k j is NP-hard. This implies the theorem. 
Given a graph G = (V,E) we create a new graph G' consisting of k + 1 copies Gi, . . .,G k+ i 
of G. Clearly G' is A;-colorable if and only if G is. We define examples in the same way as in the 
proof of Theorem 5. We must now show that there exist (non-monotone) terms Ti,T 2 , . . .,T k 
such that /(Ti,T 2 , . . -,F k ) is consistent with the examples if and only if there is a A; coloring 
of the graph G. Given a A;-coloring of the graph GTwe can easily find a A;-coloring of graph 
G'. From this coloringFwe can find k terms such that /(Ti,T 2 , . . -,F k ) is consistent with the 
examplesr using the same method as in the proof of Theorem 5. For the other directionFwe 
must show that if there are non-monotone terms T l7 . . . , T k such that /(Ti, . . . , T k ) is consistent 
with the examplesrthen G is A;-colorable. Notice that if any term T\ has in it a negated variable 
corresponding to a vertex or edge of some graph Gr^Tthen T\ is not satisfied by any example 
corresponding to graph G r for r ^ q. If term T\ has in it negated variables from more than one 
graph Gr^Tthen no examples satisfy term T/Tand thus the concept is equivalent to the concept 
with term T\ replaced by 0. If T\ contains negated variables corresponding to a vertex or edge 
of just one graph Gr^Tthen we can replace term T\ by and mark graph G q ; this new concept is 
still consistent with the examples corresponding to all unmarked graph copies. We continue this 
procedure until all terms left have no negated variables. We never mark all the graph copies 
since we mark at most one graph for each term that is set to OFand there are more graphs than 
terms. SoFsince each term left has no negated variables we can color any one of the remaining 
unmarked graphs using the coloring given in the proof of Fheorem 5. I 

2.5 Conclusion 

We present an algorithm that learns the class C k of all concepts /(Ti, . . . , T k ) where / is a {0, 1}- 
valued function and T l7 . . . , T k are monomialsFusing a hypothesis class of general DNF. We also 
show that learning the class C k j by C k j where / is a symmetric function is NP-hardrexcept 
for / G {A,-iA,T, F} for which learning is easy. We leave as open the problem of classifying 
the learnability of C k j by C k j for more general functions /. 
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Piecemeal learning of unknown 

environments 



3.1 Introduction 

We address the situation where a robotTto perform a task betteiTmust learn a complete map of 
its environment. The robot's goal is to learn this map while satisfying the piecemeal constraint 
that learning must be done "a piece at a time." Why might mobile robot exploration be done 
piecemeal? Robots may have limited powerTand after some exploration they may need to 
recharge or refuel. In additionTrobots may explore environments that are too risky or costly for 
humans to exploreTsuch as the inside of a volcano (e.gTCMU's Dante II robot)Tor a chemical 
waste siteTor the surface of Mars. In these casesTthe robot's hardware may be too expensive 
or fragile to stay long in dangerous conditions. ThusTit may be best to organize the learning 
into phasesTallowing the robot to return to a start position for refueling and maintenance. 

The "piecemeal constraint" means that each of the robot's exploration phases must be of 
limited duration. We assume that each exploration phase starts and ends at a fixed start 
position. This special location might be a refueling station or a base camp. Between explo- 
ration phases the robot might perform other unspecified tasks. Piecemeal learning thus enables 
"learning on the job" T since the phases of piecemeal learning can help the robot improve its 
performance on the other tasks it performs. This is the "exploration/exploitation tradeoff": 
spending some time exploring (learning) and some time exploiting what one has learned. 

35 
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The piecemeal constraint can make efficient exploration surprisingly difficult. We first con- 
sider piecemeal learning in environments that can be modeled as grid graphs with rectangular 
obstacles. For these environmentsFwe give two linear-time algorithms. The first algorithmr 
the "wavefront" algorithmr can be viewed as an optimization of breadth-first search for our 
problem. The second algorithmrthe "ray" algorithmFcan be viewed as a variation on depth- 
first search. We then extend these results by giving a nearly linear algorithm for piecemeal 
learning more complicated environments that can be modeled by arbitrary undirected graphs. 
For piecemeal learning of these environmentsFwe give some "approximate" breadth-first search 
algorithms. We first give a simple algorithm that runs in 0{E + V 15 ) time. We then improve 
this algorithm and give a nearly linear time algorithm: it achieves 0(E-\-V l+0 ^) running time. 
An interesting open problem is whether arbitraryFundirected graphs can be learned piecemeal 
in linear time. 

We now give a brief summary of the rest of this chapter. Section 3.2 gives some related 
work on environment learning and mobile robot navigation. Section 3.3 formalizes our model. 
Section 3.4 discusses piecemeal learning of arbitrary graphsFand the problems with some initial 
approaches. Section 3.5 gives an approximate solution to the off-line version of this problem. In 
additionrit gives our strategy for solving the problem we are interested in (the on-line version 
of the problem). Section 3.6 introduces the notion of "city-block" graphsr discusses shortest 
paths in such graphsFand gives two linear time algorithms for piecemeal learning these types 
of graphs. Section 3.7 considers piecemeal learning of general graphsFand gives a nearly linear 
algorithm for this problem. Section 3.8 gives an application of our algorithms to the problem 
of finding a treasure in an unknownFpotentially infinite graph. Finallyr Section 3.9 concludes 
with some open problems. 

3.2 Related work 

Theoretical approaches to environment learning differ in how the robot's environment is mod- 
eledrwhat types of sensors the robot hasrthe accuracy of the robot's sensorrif the robot has 
access to a teacherFand what the performance measure is. The robot's environment is often 
modeled by a finite automatonFa directed graphran undirected graphror some special case of 
the above. Typicallyrit is assumed that the robot knows what type of environment it is trying 
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to learn. The robot may have visionFor may have no long-range sensors whatsoever. Sometimes 
the robot is assumed to have accurate sensorsFand in other models the robot's sensors may be 
noisy. Performance measures for the robot's accuracy vary from requiring the robot to always 
output an exact map of the environmentTto requiring that the robot output a good map with 
high probability. Performance in terms of efficiency can be judged by either the total number 
of steps taken by the robotTthe number of queries the robot may have to ask of a teacherr 
competitive ratios (e.g.Tthe total number of steps the robot makes divided by the minimum 
number of steps required had the robot known the environment) Tor some other measure. 

Rivest and Schapire [70] study environments that can be modeled by a strongly connected 
deterministic finite automata. The robot gets information about the automaton by actively 
experimenting in the environment and by observing input-output behavior. Rivest and Schapire 
show that a robot with a teacher can with high probability learn such an environment. They 
use homing sequences to improve Angluin's algorithm [f] to learn without using a "reset" 
mechanism. Ron and Rubinfeld [71] further extend this result by giving an efficient algorithm 
that with high probability learns finite automata with small cover timer without requiring a 
teacher. Dean et al. [33] study the problem of learning finite automaton when the output at 
each state has some probability of being incorrect. They give an algorithm for learning finite 
automatarassuming that the robot has access to a distinguishing sequence. Freund et al. [43] 
give algorithms for learning "typical" deterministic finite automata from random walks. 

Deng and Papadimitriou [35] and Betke [16] model the robot's environment as a directed 
graphrwith distinct and recognizable vertices and edges. They give a learning algorithm with 
a constant competitive ratio when the graph is Eulerian or when the deficiency of the graph 
is 1. For general graphsrthey give a competitive ratio that is exponential in the deficiency of 
the graph. Bender and Slonim [11] look at the more complicated case of directed graphs with 
indistinguishable vertices. They show that a single robot with a constant number of pebbles 
cannot learn such environments without knowing the size of the graph. On the other handr 
they give a probabilistic algorithm for two cooperating robots to learn such an environment. 
Dudek et al. [38] study the easier problem of learning undirected graphs with indistinguishable 
verticesFand give an algorithm for a robot with one or markers to learn such an environment. 

DengrKamedaFand Papadimitriou [34] model environments such as "rooms" as polygons 
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with polygonal obstacles. They assume the robot has visionFand must learn a map of the 
room. They show that if the polygon has an arbitrary number of polygonal obstacles in itT 
then then it is not possible to achieve a constant competitive ratio. For the simplified case of 
a rectilinear room with no obstaclesrthey show a 2y2 competitive algorithm for learning the 
room. Kleinberg [52] improves this to a |\/2 competitive algorithm. For a rectilinear room 
with at most k obstaclesFDeng et al. give an algorithm with 0{k) competitive ratio. They also 
give constant competitive algorithms for environments that are modeled by general polygons 
with a bounded number of obstaclesrbut the constant they give is large. 

There has also been much theoretical work in the case where the robot's goal is to get from 
one point to another in an unknown environment. The robot learns parts of the environment 
as it is navigatingrbut its primary goal is to reach a particular location. In some casesrthe 
robot knows exactly where there the goal location isFand in others it is assumed that the robot 
will recognize the goal location. 

Baeza-YatesrCulberson and Rawlins [8] study the cow path problem. The robot must search 
for an object in an unknown location on 2 or more rays (the endpoints of the rays are at some 
fixed start position). They give an optimal deterministic strategy for this problem. For the 
case of 2 raysrthey use a doubling strategy and get a competitive ratio of 9; they extend this 
technique for m rays and get a competitive ratio of 1 + 2(m m /(m — l) m_1 ). KaoFReif and 
Tate [49] give a randomized algorithm for this problem that has better expected performance 
than any deterministic algorithm. KaoFMaFSipser and Yin [48] give an optimal deterministic 
search strategy for the case of multiple robots. 

Papadimitriou and Yanakakis [62] consider the problem of a robot with vision moving around 
in a plane filled with obstacles. The robot does not know its environment Tbut knows its exact 
absolute location at all timesFas well as its start position and its goal position. The robot's 
goal is to travel from the start position to the goal position. Papadimitriou and Yanakakis show 
that for the case of non-touching axis parallel rectangular obstaclesrthe competitive ratio is 
i7(y / ra)rwhere n is the length of the shortest path between the start and goal locations. For 
the case of square obstaclesrthey give a |v26 ~ 1.7 competitive algorithmFand show that any 
strategy must have competitive ratio greater than |. 

BlumrRaghavanFand Schieber [22] also study the problem of point to point navigation in 
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an unknown two-dimensional geometric environment with convex obstacles. For the case of axis 
parallel rectangular obstaclesrthey give an algorithm with competitive ratio 0(y / ra)rmatching 
the lower bound of Papadimitriou and Yanakakis. They also introduce and give an algorithm 
for the room problemTwhere the goal of the robot is to go from a point on a wall of the room 
to a specified point in the center of the room. The room contains axis parallel obstaclesrbut 
the obstacles do not touch the sides of the wall. Bar-EhTBermanrFiatrand Yan [10] show that 
any algorithm for this problem has competitive ratio O(log ra)Tand give an algorithm attaining 
this bound. 

Blum and Chalasani [21] consider the point to point problem in an unknown environment 
when the robot makes repeated trips between two points. The goal of the robot is to find better 
paths in each trip. In environments with axis parallel obstaclesrthey give an algorithm with 
the property that at the i-th. triprthe robot's path is 0(\/n/i) times the shortest path length. 

Klein [51] considers the problem of a polygon with distinguished start and goal vertices. 
The robot's goal is to walk inside the polygon from the start location to the goal location. The 
goal location is recognized as soon as the robot sees it. For a special type of polygon known 
as a s^reeT 1 Klein gives an algorithm with a 1 + |7r ~ 5.71 competitive ratio. Kleinberg [52] 



improves this by giving an algorithm with competitive ratio y 4 + V8 ~ 2.61. For rectilinear 
streetsMie algorithm achieves a competitive ratio of y/2. 

There are many other related papers in the literaturer particularly in the area of robotics 
(e.gT[57]) and maze searching (e.g.T [25F24]). RaoFKaretirShirand Iyengar [68] give a survey 
of work on robot navigation in unknown terrains. 

3.3 Formal model 

We model the robot's environment as a finite connected undirected graph G = (V, E) with dis- 
tinguished start vertex s. Vertices represent accessible locations. Edges represent accessibility: 
if {x, y} G E then the robot can move from x to yTor backrin a single step. 

We assume that the robot can always recognize a previously visited vertex; it never confuses 
distinct locations. At any vertex the robot can sense only the edges incident to it; it has no 
vision or long-range sensors. The robot can distinguish between incident edges at any vertex. 
Each edge has a label that distinguishes it from any other edge. Without loss of generalityr 
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we can assume that the edges are ordered. At a vertexrthe robot knows which edges it has 
traversed already. The robot only incurs a cost for traversing edges; thinking (computation) is 
free. We also assume a uniform cost for an edge traversal. We consider the running time of a 
piecemeal learning algorithm to be the number of edge traversals made by the robot. 

The robot is given an upper bound B on the number of steps it can make (edges it can 
traverse) in one exploration phase. In order to assure that the robot can reach any vertex in 
the graphrdo some explorationrand then get back to the start vertexFwe assume B allows for 
at least one round trip between s and any other single vertex in GTand also allows for some 
number of exploration steps. More preciselyFwe assume B = (2 + a)7Twhere a > is some 
constantTand r is the radius of the graph (the maximum of all shortest-path distances between 
s and any vertex in G). 

Initially all the robot knows is its starting vertex srthe bound i?rand the radius r of the 
graph. The robot's goal is to explore the entire graph: to visit every vertex and traverse every 
edgerminimizing the total number of edges traversed. 

3.4 Initial approaches to piecemeal learning 

A simple approach to piecemeal learning of arbitrary undirected graphs is to use an ordinary 
search algorithm — breadth-first search (BFS) or depth-first search (DFS) — and just interrupt 
the search as needed to return to visit s. (Detailed descriptions of BFS and DFS can be found 
in algorithms textbooks [32].) Once the robot has returned to srit goes back to the vertex at 
which search was interrupted and resumes exploration. We now illustrate the problems each of 
these approaches has for efficient piecemeal learning. 

Depth-first search 

In depth-first searchredges are explored out of the most recently discovered vertex v that still has 
unexplored edges leaving it. When all of f's edges have been exploredrthe search "backtracks" 
to explore edges leaving the vertex from which v was discovered. This process continues until all 
edges are explored. This search strategyFwithout interruptions due to the piecemeal constraintT 
is efficient since at most 2\E\ edges are traversed. InterruptionsFor exploration in phases of 
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limited durationFcomplicate matters. For exampleFsuppose in the first phase of explorationFat 
step B/2 of a phase the robot reaches a vertex v as illustrated in Figure 3.1. MoreoverFsuppose 
that the only path the robot knows from s to v has length B/2. At this pointTthe robot must 
stop exploration and go back to the start location s. In the second phaserin order for the robot 
to resume a depth-first searchrit should go back to fFthe most recently discovered vertex. 
HoweverFsince the robot only knows a path of B/2 to vTit cannot proceed with exploration 
from that point. 
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Figure 3.1: Fhe robot reaches vertex v after B/2 steps in a depth-first search. Fhen it must 
interrupt its search and return to s. It cannot resume exploration at v to get to vertex wT 
because the known return path is longer than _B/2rthe remaining number of steps allowed in 
this exploration phase. DFS fails. 

Since DFS with interruptions fails to reach all the vertices in the graphranother approach 
to solve the piecemeal learning problem would be to try a bounded depth-first search strategy. 
In bounded DFSTedges are explored out of the most recently discovered vertex v which had 
depth less than a given bound (3. HoweverFa straightforward bounded DFS strategy also does 
not translate into an efficient piecemeal learning algorithm for arbitrary undirected graphs. 



Breadth-first search 

Unlike depth-first searchrbreadth-first search with interruptions does guarantee that all vertices 
in the graph are ultimately explored. Whereas a DFS strategy cannot resume exploration at 
vertices to which it only knows a long pathra BFS strategy can always resume exploration. 
Fhis is because BFS ensures that the robot always knows a shortest path from s to any explored 
vertex. HoweverFsince a BFS strategy explores all the vertices at the same distance from s 
before exploring any vertices that are further away from srthe resulting algorithm may not be 
efficient. Note that in the usual BFS modeirthe algorithm uses a queue to keep track of which 
vertex it will search from next. Fhusr searching requires extracting a vertex from this queue. 
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In our modeirhoweveiTsmce the robot can only search from its current locationFextracting a 
vertex from this queue results in a relocation from the robot's current location to the location 
of the new vertex. Unlike the standard BFS modeirour model does not allow the robot to 
"teleport" from one vertex to another; insteadrwe consider a teleport-free exploration modeir 
where the robot must physically move from one vertex to the next. 

In BFSTthe robot may not move further away from the source than the unvisited vertex 
nearest to the source. At any given time in the algorithml 1 let A denote the shortest-path 
distance from s to the vertex the robot is visitingFand let 8 denote the shortest-path distance 
from s to the vertex nearest to s that is as yet unvisited. With traditional breadth-first search 
we have A < 8 at all times. With teleport-free explorationrit is generally impossible to maintain 
A < 8 without a great loss of efficiency: 

Lemma 4 A robot which maintains A < 8 (such as a traditional BFS) may traverse £l(E 2 ) 
edges. 

Proof: Consider the graph in Figure 3.2Fwhere the vertices are { — n,—n + 1, . . .T —l,s = 
0, 1, 2, . . . , n — 1, rajTand edges connect consecutive integers. Fo achieve A < 8T& teleport-free 
BFS algorithm would run in quadratic timertraveling back and forth from 1 to —1 to —2 to 2 
to 3 .... ■ 



S 
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Figure 3.2: A simple graph for which the cost of BFS is quadratic in the number of edges. 



3.5 Our approaches to piecemeal learning 

In this sectionFwe discuss our approach to piecemeal learning of general graphs. First we 
define the off-line version of this problemFand give an approximate solution for itTand then we 
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give a general method for converting certain types of search algorithms into piecemeal learning 
algorithms. 

3.5.1 Off-line piecemeal learning 

We now develop a strategy for the off-line piecemeal learning problem which we can adapt to 
get a strategy for the on-line piecemeal learning problem. 

In the off-line piecemeal learning problemrthe robot is given a finite connected undirected 
graph G = (V, E)T& start location s G FTand a bound B on the number of edges traversed in 
any exploration phase. The robot's goal is to plan an optimal search of the graph that visits 
every vertex and traverses every edgerand also satisfies the piecemeal constraint (i.e. Teach 
exploration phase traverses at most B edges and starts and ends at the start location). Note 
that since the graph is givenTthe problem does not actually have a learning or exploration 
component. However rfor simplicity we continue using "learning" and "exploration." 

The off-line piecemeal learning problem is similar to the well-known Chinese Postman Prob- 
lem [39]Tbut where the postman must return to the post-office every so often. (We could call 
the off-line problem the Weak Postman Problem! for postmen who cannot carry much mail.) 
The same problem arises when many postmen must cover the same city with their routes. 

The Chinese Postman Problem can be solved by a polynomial time algorithm if the graph 
is either undirected or directed [39]. The Chinese Postman problem for a mixed graph that has 
undirected and directed edges was shown to be NP-complete by Papadimitriou [61]. We do not 
know an optimal off-line algorithm for the Weak Postman Problem; this may be an NP-hard 
problem. 

We now give an approximation algorithm for the off-line piecemeal learning problem using 
a simple "interrupted-DFS" approach. 

Theorem 6 There exists an approximate solution to the off-line piecemeal learning problem 
for an arbitrary undirected graph G = (V,E) which traverses 0(\E\) edges. 

Proof: Assume that the radius of the graph is r and that the number of edges the robot is 
allowed to traverse in each phase of exploration is B = (2 + a)rrfor some constant a such that 
ar is a positive integer. Before the robot starts traversing any edges in the graphrit looks at 
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the graph to be exploredr and computes a depth-first search tree of the graph. A depth-first 
traversal of this depth-first search tree defines a path of length 2\E\ which starts and ends at s 
and which goes through every vertex and edge in the graph. The robot breaks this path into 
segments of length ar. The robot also computes (off-line) a shortest path from s to the start 
of each segment. 

The robot then starts the piecemeal learning of the graph. Each phase of the exploration 
consists of taking a shortest path from s to the start of a segmentTtraversing the edges in the 
segmentTand taking a shortest path back to the start vertex. For each segmentTthe robot 
traverses at most 2r edges to get to and from the segmentTand ar edges to explore the segment 
itself. Thusrsince the total number of edge traversals for each segment is at most (2 + a)r = BT 
the piecemeal constraint is satisfied. Since there are [-!—!■] segmentsr there are [-!—!■] — 1 
interruptionsrand the number of edge traversals due to interruptions is at most: 



2\E\ 



ar 



(2\E\ , 
1 ) 2r < -i—i ) 2r 
\ ar 

4\E\ 



a 



Thus the total number of edge traversals is at most (4/a + 2)\E\ = 0(E). I 

3.5.2 On-line piecemeal learning 

We now show how we can change the strategy outlined above to obtain an efficient on-line 
piecemeal learning algorithm. 

We call an on-line search optimally interruptible if it always knows a shortest path back 
to s that can be composed from the edges that have been explored. We refer to a search as 
efficiently interruptible if it always knows a path back to s via explored edges of length at most 
the radius of the graph. 

Theorem 7 An efficiently interruptible algorithm for exploring an unknown graph G = (V, E) 
with n vertices and m edges that takes time T(n, m) can be transformed into a piecemeal learning 
algorithm that takes time 0(T(n,m)). 

Proof: The proof of this theorem is similar to the proof of Theorem 6. Howeverrthere are a 
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few differences. Instead of using an ordinary search algorithm (like DFS) and interrupting as 
needed to return to sFwe use an efficiently interruptible search algorithm. Moreoverrthe search 
is on-line and is being interrupted during exploration. Finallyrthe cost of the search is not 2\E\ 
as in DFSTbut at most T(n,m). 

Assume that the radius of the graph is r and that the number of edges the robot is allowed 
to traverse in each phase of exploration is B = (2 + a)rrfor some constant a such that ar is a 
positive integer. In each exploration phaserthe robot will execute ar steps of the original search 
algorithm. At the beginning of each phase the robot goes to the appropriate vertex to resume 
exploration. Fhen the robot traverses ar edges as determined by the original search algorithmr 
and finally the robot returns to s. Since the search algorithm is efficiently interruptiblerthe 
robot knows a path of distance at most r from s to any vertex in the graph. Fhus the robot 
traverses at most 2r + ar = B edges during any exploration phase. 

Since there are [ l n > m ) ~| segmentsrthere are [ l n > m ) ~| _ \ interruptionsFand the number of 
edge traversals due to interruptions is: 



T(n, m) 



ar 



1 ) 2r < — y — — J -2r 



< 



ar 

2T(n, to) 

a 



Fhusrthe total number of edge traversals is T(n,m) + 2T(n,m)/a = T(n, ra)(l + 2/ a) = 
0(T(n,m)). ■ 

For arbitrary undirected planar graphsFwe can show that any optimally interruptible search 
algorithm requires S7(|_E| 2 ) edge traversals in the worst case. For exampleFexploring the graph 
in Figure 3.2 (known initially only to be an arbitrary undirected planar graph) would result in 
\E\ 2 edge traversals if the search is required to be optimally interruptible. 

Because it seems difficult to handle arbitrary undirected graphs efficientlyr we first focus 
our attention on a special class of undirected planar graphs. Fhese graphsr known as city- 
block giaphsT sue defined in the Section 3.6.1. For these graphs we present two efficient 0(|i?|) 
optimally interruptible search algorithms. Since an optimally interruptible search algorithm is 
also an efficiently interruptible search algorithmrthese two algorithms give efficient piecemeal 
learning algorithms for city-block graphs. Fhe wavefront algorithm is a modification of breadth- 
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first search that is optimized for city-biock graphs. The ray aigorithm is a variation on depth- 
first search. For piecemeai iearning arbitrary undirected graphsFsince optimahy interruptibie 
search aigorithms are not efficientrwe iook at efficiently interruptibie search algorithms. In 
particularFour algorithms are approximate breadth-first search algorithms. 

3.6 Linear time algorithms for city-block graphs 

Fhis section first defines and motivates the class of city-block graphsFand then develops some 
useful properties of such graphs that will be used in Subsections 3.6.2 (which gives the wavefront 
algorithm for piecemeal learning of a city-block graph) and 3.6.3 (which gives the ray algorithm). 
Both the wavefront algorithm and the ray algorithm are optimally interruptibleFand thus 
maintain at ah times knowledge of a shortest path back to s. Since BFS is optimally inter- 
ruptibleFwe study BFS in some detail to understand the characteristics of shortest paths in 
city-block graphs. Our algorithms depend on the special properties that shortest paths have 
in city-block graphs. We also study BFS because our wavefront algorithm is a modification of 
BFS. 

3.6.1 City-block graphs 

We model environments such as cities or office buildings in which efficient on-line robot nav- 
igation may be needed. We focus on grid graphs containing some non-touching axis-parallel 
rectangular "obstacles". We call these graphs city-block graphs. They are rectangular planar 
graphs in which all edges are either vertical (north-south) or horizontal (east-west)rand in which 
all faces (city blocks) are axis-parallel rectangles whose opposing sides have the same number 
of edges. A 1 X 1 face might correspond to a standard city-block; larger faces might correspond 
to obstacles (parks or shopping malls). Figure 3.3 gives an example. City-block graphs are also 
studied by Papadimitriou and Yanakakis [62]rBlumFRaghavanFand Schieber [22]Tand Bar-EhT 
BermanrFiat and Yan [10]. 

An m X n city-block graph with no obstacles has exactly ran vertices (at points (i,j) for 
1 < i < raTl < j < n) and 2ran — (m + n) edges (between points at distance 1 from each 
other). Obstaclesrif presentTdecrease the number of accessible locations (vertices) and edges 
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in the city-block graph. In city-block graphs the vertices and edges are deleted such that all 
remaining faces are rectangles. 

We assume that the directions of incident edges are apparent to the robot. 
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Figure 3.3: A city-block graph with distinguished start vertex s. 

Let 8(v,v') denote the length of the shortest path between v and fTand let d[v] denote 
S(v, s)Tthe length of the shortest path from v back to s. 



Monotone paths and the four-way decomposition 

A city-block graph can be usefully divided into four regions (northrsouthreastTand west) by four 
monotone paths: an east-north patUan east-south pathra west-north patUand a west-south 
path. The east-north path starts from sFproceeds east until it hits an obstaclerthen proceeds 
north until it hits an obstaclerthen turns and proceeds east againFand so on. The other paths 
are similar (see Figure 3.4). Note that all monotone paths are shortest paths. Furthermorer 
note that s is included in all four regionsFand that each of the four monotone paths (east-northr 
east-southrwest-northrwest-south) is part of all regions to which it is adjacent. 

In Femma 5 we show that for any vertexr there is a shortest path to s through only one 
region. Without loss of generalityr we therefore only consider optimally interruptible search 
algorithms that divide the graph into these four regionsFand search these regions separately. 
We only discuss what happens in the northern region; the other regions are handled similarly. 



48 Piecemeal learning of unknown environments 



1 | — mtTTTTr -* 

-* I I I I I I i i - =J=- 

< i ■ _ 

1 |— i 1 I i I I I *■ 

^^^ ■ 1 1 

=== l F = ^ W "*- U === T === 

i i i i i i 

< ' r^n < ' i 



Figure 3.4: The four monotone paths and the four regions. 

Lemma 5 There exists a shortest path from s to any point in a region that only goes through 
that region. 

Proof: Consider a point v in some region A. Let p be any shortest path from s to the point 
v. If p is not entirely contained in region ATwe can construct another path p' that is entirely 
contained in region A. We note that the vertices and edges which make up the monotone paths 
surrounding a region A are considered to be part of that region. 

Since path p starts and ends in region A but is not entirely contained in region AT there 
must be a point u that is on p and also on one of the monotone paths bordering A. Note that 
u may be the same as v. Without loss of generality]! 1 let u be the last such pointTso that the 
portion of the path from u to v is contained entirely within region A. Then the path p' will 
consist of the shortest path from s to u along the monotone path that u is onrfollowed by the 
portion of p from u to v . This path p' is a shortest path from s to v because p was a shortest 
path and p' can be no longer than p. I 



Canonical shortest paths of city-block graphs 

We now make a fundamental observation on the nature of shortest paths from a vertex v back 
to s. In this sectionrwe consider shortest paths in the northern region; properties of shortest 
paths in other region are similar. 



3.6 Linear time algorithms for city-block graphs 49 

Lemma 6 For any vertex v in the northern region, there is a canonical shortest path from v to 
the start vertex s which goes south whenever possible. The canonical shortest path goes east or 
west only when it is prevented from going south by an obstacle or by the monotone path defining 
the northern region. 

Proof: We call the length d[v] of the shortest path from v to s the depth of vertex v. We show 
this lemma by induction on the depth of a vertex. 

For the base caserit is easy to verify that any vertex v such that d[v] = 1 has a canonical 
shortest path that goes south whenever possible. 

For the inductive hypothesisFwe assume that the lemma is true for all vertices that have 
depth t— lTand we want to show it is true for all vertices that have depth t. Consider a vertex p 
at depth t. If there is an obstacle obstructing the vertex that is south of point p or if p is on a 
horizontal segment of the monotone path defining the northern regionrthen it is impossible for 
the canonical shortest path to go southrand the claim holds. ThusFassume the point south of 
p is not obstructed by an obstacle or by the monotone path defining the northern region. Fhen 
we have the following cases: 

Case 1: Vertex p s directly south of p has depth t — 1. In this caser there is clearly a 
canonical shortest path from p to s which goes south from p to p s and then follows the 
canonical shortest path of p s Fwhich we know exists by the inductive assumption. 

Case 2: Vertex p s directly south of p has depth not equal to t — 1. Fhen one of the 
remaining adjacent vertices must have depth t—1 (otherwise it is impossible for p to have 
depth t). FurthermoreFnone of these vertices has depth less than t — IT for otherwise 
vertex p would have depth less than t. 

Note that the point directly north of p cannot have depth t—1. If it didrthen by the 
inductive hypothesisrit has a canonical shortest path which goes south. But then p has 
depth t — 2rwhich is a contradiction. 

FhusFeither the point west of p or the point east of p has depth t — 1. Without loss of 
generalityFassume that the point p w west of p has depth t—1. We consider two subcases. 
In case (a)rthere is a path of length 2 from p w to p s that goes south one step from p w T 
and then goes east to p s . In case (b)rthere is no such path. 
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Case (a): If there is such a pathrthe vertex directly south of p w existsFand by the 
inductive hypothesis has depth t — 2 (since there is a canonical shortest path from 
p w to s of length t — lTthe vertex directly to the south of p w has depth t — 2). Then 
p s Twhich is directly east of this pointThas depth at most t — 1 and thus there is a 
canonical path from p to s which goes south whenever possible. 

Case (b): Note that the only way there does not exist a path of length 2 from p w to 
p s (other than the obvious one through p) is if p is a vertex on the northeast corner 
of an obstacle which is bigger than lxl. Suppose the obstacle is k x X A^Twhere &i is 
the length of the north (and south) side of the obstaclerand k 2 is the length of the 
east (and west) side of the obstacle. We know by the inductive hypothesis that the 
canonical shortest path from p w goes either east or west along the north side of this 
obstaclerand since the vertex p has depth t we know that the canonical shortest path 
goes west. After having reached the corner Mie canonical shortest path from p w to s 
proceeds south. ThusMie vertex which is on the southwest corner of this obstacle 
has depth / = t — 1 — (ki — l) — k 2 . If we go from this vertex to p s along the south side 
of the obstacle and then along the east side of the obstaclerthen the depth of point 
p s is at most I -\- ki -\- (k 2 — 1) = t — 1. Thusrin this case there is also a canonical 
path from p to s which goes south whenever possible. 



Lemma 7 Consider adjacent vertices v and w in a city-block graph where v is north of w. In 
the northern region, without loss of generality, d[v] = d[w] + 1. 

Proof: The proof follows immediately from Lemma 6. I 

Lemma 8 Consider adjacent vertices v and w in a city-block graph where v is west of w. In 
the northern region, without loss of generality, d[v] = d[w] ± 1. 

Proof: We prove the lemma by induction on the y-coordinate of the vertices in the northern 
region. If v and w have the same y-coordinate as srthen we know that d[v] = d[w] + 1 if s is 
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east of v and d[v] = d[w] — 1 if s is west of w. Assume that the claim is true for vertices v 
and w with y-coordinate k. In the following we show that it is also true for vertices v and w 
with y-coordinate k + 1. We distinguish the case that there is no obstacle directly south of v 
and w from the case that there is an obstacle directly south of v or w. 

Case 1: If there is no obstacle directly south of v and wTor there a 1 X 1 obstacle with u 
and w on the north siderthe lemma follows by Lemma 7 and the induction assumption. 

Case 2: If there is an obstacle directly south of v or wLthen we assume without loss of 
generality that both v and w are on the boundary of the north side of the obstacle. (Note 
that Doric mayrhoweverrbe at a corner of the obstacle.) 

If the lemma does not hold it means that d[v] = d[w] for two adjacent vertices v and w 
(becauseLin any graphrthe d values for adjacent vertices can differ by at most one). This 
would also mean that all shortest paths from v to s must go through vertex v w at the 
north-west corner of the obstacle and all shortest paths from w to s must go through 
vertex v e at the north-east corner of the obstacle (v w may be the same as v rand v e may 
be the same as w). However IVe next show that there is a grid point m on the boundary 
of the north side of the obstacle that has shortest paths through both v e and v w . The 
claim of Lemma 8 follows directly. 

The distance x between m and v w can be obtained by solving the following equation: 
x + d[v w ] = (k — x) + d[v e ] where k is the length of the north side of the obstacle. The 
distance x is (£; + <i[f e ] — d[v w ])/2. Using the inductive hypothesis and Lemma 6Lwe know 
that if k is even then \d[v e ] — d[v w ]\ is evenLand if k is odd then \d[v e ] — d[v w ]\ is odd. 
Thus the distance x is integralLand m exists in the graph. 



3.6.2 The wavefront algorithm 

The wavefront algorithm is based on BFSLbut overcomes the inefficiency BFS has due to 
relocation cost. In this sectionLwe first develop some preliminary concepts and results based 
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on an analysis of breadth-first search in city-block graphs. We then present the wavefront 
algorithmrprove its correctnessFand show that it runs in linear time. 

Properties of BFS in city-block graphs 

In city-block graphsFBFS can be viewed as exploring the graph in waves that expand outward 
from the start vertex sFmuch as waves expand from a pebble thrown into a pond. Figure 3.5 
illustrates the wavefronts that can arise. 




Figure 3.5: Environment explored by breath-first searchr showing only "wavefronts" at odd 
distance to s. 



A wavefront w can then be defined as an ordered list of explored vertices 
(vi, t> 2 , . . . , v m )T m > IT such that d[vi] = d[vi] for all irand such that S(vi,v i+ i) < 2 for 
all i. (As we shall prove in Femma 9rthe distance between adjacent points in a wavefront is 
always exactly equal to 2.) We call d[w] = d[vi] the distance of the wavefront. 

Fhere is a natural "successor" relationship between BFS wavefrontsFas a wavefront at 
distance t generates a successor at distance t + 1. We informally consider a wave to be a 
sequence of successive wavefronts. Because of obstaclesrhoweverFa wave may split (if it hits 
an obstacle) or merge (with another waveFon the far side of an obstacle). Fwo wavefronts are 
sibling wavefronts if they each have exactly one endpoint on the same obstacle and if the waves 
to which they belong merge on the far side of that obstacle. Fhe point on an obstacle where the 
waves first meet is called the meeting point m of the obstacle. In the northern regionFmeeting 
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points are always on the north side of obstaclesFand each obstacle has exactly one meeting 
point on its northern side. See Figure 3.6. 
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Figure 3.6: Splitting and merging of wavefronts along a corner of an obstacle. Illustration 
of meeting point and sibling wavefronts: Wi and w 2 are sibling wavefronts which belong to 
different "waves." The waves merge at the meeting point. 



Lemma 9 A wavefront can only consist of diagonal segments. 

Proof: By definition a wavefront is a sequence of vertices at the same distance to s for which the 
distance between adjacent vertices is at most 2. It follows from Lemma 7 and 8 that neighboring 
points in the grid cannot be in the same wavefront. Thereforerthe distance between adjacent 
vertices is exactly 2. Thusrthe wavefront can only consist of diagonal segments. I 

We call the points that connect diagonal segments (of different orientation) of a wavefront 
peaks or valleys. In the northern regionFa peak is a vertex on the wavefront that has a larger 
y-coordinate than the y-coordinates of its adjacent vertices in the wavefrontTand a valley is a 
vertex on the wavefront that has a smaller y-coordinate than the y-coordinates of its adjacent 
vertices (see Figure 3.7). 

The initial wavefront is just a list containing the start point s. Until a successor of the initial 
wavefront hits an obstaclerthe successor wavefronts in the northern region consist of two diag- 
onal segments connected by a peak. This peak is at the same ^-coordinate for these successive 
wavefronts. ThereforeFwe say that the shape of the wavefronts does not change. In the northern 
region a wavefront can only have descendants that have a different shape if a descendant curls 
around the northern corners of an obstacleFor if it merges with another wavefrontTor if it splits 
into other wavefronts. These descendants may then have more complicated shapes. 
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peak valley peak 




Figure 3.7: Shapes of wavefronts. Illustration of peaks and valleysFand front and back of an 
obstacle. The meeting point is the lowest point in the valley. 



A wavefront w splits whenever its hits an obstacle. That isTif a vertex f, in the wavefront 
is on the boundary of an obstacler w splits into wavefronts Wi = (f l7 f 2 , . . . , f 8 ) and w 2 = 
(vi, fj'+i, . . -,v m ). Wavefront Wi propagates around the obstacle in one directionFand wavefront 
w 2 propagates around in the other direction. Eventuallyrsome descendant wavefront of Wi and 
some descendant wavefront of w 2 will have a common point on the boundary of the obstacle — 
the meeting point. The position of the meeting point is determined by the shape of the wave 
approaching the obstacle. (In the proof of Lemma 8IVertex m is a meeting point and we showed 
how to calculate its position once the length k of the north side of the obstacle and the shortest 
path distances of the vertices v e and v w at the north-east and north-west corners of the obstacle 
are known: the distance from v w to the meeting point m is (k + d[v w ] — d[v e ])/2.) 

In the northern regionrthe front of an obstacle is its south siderthe back of an obstacle is 
its north sideFand the sides of an obstacle are its east and west sides. A wave always hits the 
front of an obstacle first. Consider the shape of a wave before it hits an obstacle and its shape 
after it passes the obstacle. If a peak of the wavefront hits the obstacle (but not at a corner)r 
this peak will not be part of the shape of the wave after it "passes" the obstacle. Insteadrthe 
merged wavefront may have one or two new peaks which have the same ^-coordinates as the 
sides of the obstacle (see Figure 3.7). The merged wavefront has a valley at the meeting point 
on the boundary of the obstacle. 
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Description of the wavefront algorithm 

The wavefront algorithmr presented in this sectionr mimics BFS in that it computes exactly 
the same set of wavefronts. Howeverrin order to minimize relocation costsrthe wavefronts 
may be computed in a different order. Rather than computing all the wavefronts at distance t 
before computing any wavefronts at distance t + 1 (as BFS does)Tthe wavefront algorithm will 
continue to follow a particular wave persistentlyr before it relocates and pushes another wave 
along. 

We define expanding a wavefront w = (vi,v 2 , . . .,t>/) as computing a set of zero or more 
successor wavefronts by looking at the set of all unexplored vertices at distance one from any 
vertex in w. Every vertex v in a successor wavefront has d[v] = d[w] + 1. The robot starts 
with vertex on one end of the wavefront and moves to all of its unexplored adjacent vertices. 
The robot then moves to the next vertex in the wavefront and explores its adjacent unexplored 
vertices. It proceeds this way down the vertices of the wavefront. 

The following lemma shows that a wavefront of / vertices can be expanded in time 0(1). 

Lemma 10 A robot can expand a wavefront w = (f l7 f 2 , . . . , f;) by traversing at most 2(7 — 
l) + 2|7/2] +4 edges. 

Proof: To expand a wavefront w = (t>i, v 2 , ■ ■ ■ , Vi) the robot needs to move along each vertex in 
the wavefront and find all of its unexplored neighbors. This can be done efficiently by moving 
along pairs of unexplored edges between vertices in w. These unexplored edges connect / of 
the vertices in the successor wavefront. This results in at most 2(7 — 1) edge traversalsFsince 
neighboring vertices are at most 2 apart. The successor wavefront might have / + 2 verticesr 
and thus at the beginning and the end of the expansion (i.e. Tat vertices Vi and f;)rthe robot 
may have to traverse an edge twice. In additionFat any vertex which is a peakTthe robot may 
have to traverse an edge twice. Note that a wavefront has at most [7/2] peaks. Fhusrthe total 
number of edge traversals is at most 2(1 — 1) + 2 [7/2] +4. I 

Since our algorithm computes exactly the same set of wavefronts as BFSTbut persistently 
pushes one wave alongrit is important to make sure the wavefronts are expanded correctly. 
Fhere is really only one incorrect way to expand a wavefront and get something other than 
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what BFS obtained as a successor: to expand a wavefront that is touching a meeting point 
before its sibling wavefront has merged with it. Operationallyrthis means that the wavefront 
algorithm is blocked in the following two situations: 

(a) It cannot expand a wavefront from the side around to the back of an obstacle before 
the meeting point for that obstacle has been set (see Figure 3.8). 

(b) It cannot expand a wavefront that touches a meeting point until its sibling has arrived 
there as well (see Figure 3.9). 

A wavefront w 2 blocks a wavefront Wi if w 2 must be expanded before Wi can be safely expanded. 
We also say w 2 and Wi interfere. 




Figure 3.8: Blockage of Wi by w 2 . Wavefront Wi has finished covering one side of the obstacle 
and the meeting point is not set yet. 
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Figure 3.9: Blockage ofwi by w 2 . Wavefront Wi has reached the meeting point on the obstacler 
but the sibling wavefront w 2 has not. 



A wavefront w is an expiring wavefront if its descendant wavefronts can never interfere with 
the expansion of any other wavefronts that now exist or any of their descendants. A wavefront w 
is an expiring wavefront if its endpoints are both on the front of the same obstacle; w will expand 
into the region surrounded by the wavefront and the obstacleFand then disappear or "expire." 
We say that a wavefront expires if it consists of just one vertex with no unexplored neighbors. 
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Figure 3.10: Triangular areas (shaded) delineated by two expiring wavefronts. 



Procedure Wavefront-algorithm is an efficient optimally interruptible search algorithm 
that can be used to create an efficient piecemeal learning algorithm. It repeatedly expands one 
wavefront until it splitsFmergesFexpiresFor is blocked. The Wavefront-algorithm takes as 
an input a start point s and the boundary coordinates of the environment. It calls procedure 
Create-monotone-paths to explore four monotone paths (see Section 3.6.1) and define the 
four regions. Then procedure Explore-area is called for each region. 



Wavefront-algorithm (s, boun 


dary) 


1 


create monotone paths 




2 


For region = north, south, east, 


and west 


3 


initialize current wavefront w 


:=(s) 


4 


Explore-area (w, region) 




5 


take a shortest path to s 





For each region we keep an ordered list L of all the wavefronts to be expanded. In the north- 
ern regionrthe wavefronts are ordered by the ^-coordinate of their west-most point. Neighboring 
wavefronts are wavefronts that are adjacent in the ordered list L of wavefronts. Note that for 
each pair of neighboring wavefronts there is an obstacle on which both wavefronts have an 
endpoint. 

InitiallyFwe expand each wavefront in the northern region from its west-most endpoint to 
its east-most endpoint (i.e.Twe are expanding wavefronts in a "west-to-east" manner). The 
direction of expansion changes for the first time in the northern region when a wavefront is 
blocked by a wavefront to its west (the direction of expansion then becomes "east-to-west"). In 
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factrthe direction of expansion changes each time a wavefront is blocked by a wavefront that 
is in the direction opposite of expansion. We introduce this notion of expanding wavefronts 
in either "west-to-east" or "east-to-west" directions in order to simplify the analysis of the 
algorithm. 

We treat the boundaries as large obstacles. The north region has been fully explored when 
the list L of wavefronts is empty. Note that vertices on the monotone paths are considered 
initially to be unexploredrand that expanding a wavefront returns a successor that is entirely 
within the same region. 

Each iteration of Explore-area expands a wavefront. When Expand is called on a wave- 
front wrthe robot starts expanding w from its current locationrwhich is a vertex at one of the 
endpoints of wavefront w. It is often convenientThoweverrto think of Expand as finding the 
unexplored neighbors of the vertices in w in parallel. 

Depending on what happens during the expansionr the successor wavefront can be split r 
mergedrblockedror may expire. Note that more than one of these cases may apply. 

Procedures Merge and Split (see following pages) handle the (not necessarily disjoint) 
cases of merging and splitting wavefronts. Note that we use call-by-reference conventions for 
the wavefront w and the list L of wavefronts (that isr assignments to these variables within 
procedures Merge and Split affect their values in procedure Explore-area). Each time 
procedure Relocate(w, dir) is calledrthe robot moves from its current location to the appro- 
priate endpoint of w: in the northern regionrif the direction is "west-to-east" the robot moves 
to the west-most vertex of wT&nd if the direction is "east-to-westF the robot moves to the 
east-most vertex of w. 

Procedure Relocate(w, dir) can be implemented so that when it is calledrthe robot sim- 
ply moves from its current location to the appropriate endpoint of w via a shortest path 
in the explored area of the graph. Howeverrfor analysis purposesr we assume that when 
Relocate(w, dir) is called the robot moves from its current location to the appropriate end- 
point of w as follows. 

• When procedure Relocate(w s , dir) is called in line 5 of ExPLORE-AREArthe robot tra- 
verses edges between the vertices in wavefront w s to get back to the appropriate endpoint 
of the newly expanded wavefront. 
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EXPLORE-AREA (w, region) 


1 


initialize list of wavefronts L := (w) 


2 


initialize direction dir := west-to-east 


3 


Repeat 


4 


EXPAND current wavefront w to successor wavefront w s 


5 


Relocate (w s , dir) 


6 


current wavefront w := w s 


7 


If w is a single vertex with no unexplored neighboring vertices 


8 


Then 


9 


remove w from ordered list L of wavefronts 


10 


If L is not empty 


11 


Then 


12 


w := neighboring wavefront of w in direction dir 


13 


Relocate (w, dir) 


14 


Else 


15 


replace w by w s in ordered list L of wavefronts 


16 


If the second back corner of any obstacle(s) 




has just been explored 


17 


Then set meeting points for those obstacle(s) 


18 


If w can be merged with adjacent wavefront (s) 


19 


Then MERGE (w, L, region, dir) 


20 


If w hits obstacle(s) 


21 


Then SPLIT (w } L } region, dir) 


22 


If L not empty 


23 


Then 


24 


If w is blocked by neighboring wavefront w 1 in direction 




D G {west-to-east, east-to-west} 


25 


Then 


26 


dir := D 


27 


While w is blocked by neighboring wavefront w 1 


28 


Do 


29 


w := w 1 


30 


Relocate (w, dir) 


31 


Until L is empty 
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• When procedure Relocate(w s , dir) is called in line 13 of ExPLORE-AREAr the robot 
traverses edges along the boundary of an obstacle. 

• When procedure Relocate(w s , dir) is called in line 9 of MERGErthe robot traverses 
edges between vertices in wavefront w to get to the appropriate endpoint of the newly 
merged wavefront. 

• When procedure Relocate(w s , dir) is called in line 30 of ExPLORE-AREAr the robot 
traverses edges as follows. Suppose the robot is in the northern region and at the west- 
most vertex of wavefront w rand assume that w is to the east of w . Note that both w 
and w are in the current ordered list of wavefronts L. Thus there is a path between the 
robot's current location and wavefront w which "follows the chain" of wavefronts between 
w and w. That isrthe robot moves from w to w as follows. Let Wi,w 2 , ■ ■ ■ , w k be the 
wavefronts in the ordered list of wavefronts between w and and wTand let b , 6 l5 . . .b k+ i 
be the obstacles separating wavefronts w , Wi, . . . , w k , w (i.e.Tobstacle b is between w 
and WiTobstacle b\ is between W\ and w 2 rand so on). Then to relocate from w to wTthe 
robot traverses the edges between vertices of wavefront w to get to the east-most vertex 
of w which is on obstacle b . Then the robot traverses the edges of the obstacle b to get 
to the west-point vertex of w^rand then the robot traverses the edges between vertices 
in wavefront Wi to get to the east-most vertex of Wi which is on obstacle bi. The robot 
continues traversing edges in this manner (alternating between traversing wavefronts and 
traversing obstacles) until it is at the appropriate end vertex of wavefront w. 



MERGE (w } L } region, dir) 

1 remove w from list L of wavefronts 

2 While there is a neighboring wavefront w' with which w can merge 

3 Do 

4 remove w' from list L of wavefronts 

5 merge w and w' into wavefront w" 

6 w := w" 

7 put w in ordered list L of wavefronts 

8 If w is not blocked 

9 Then RELOCATE (w, dir) 
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Wavefronts are merged when exploration continues around an obstacle. A wavefront can be 
merged with two wavefronts rone on each end. 

When procedure Split is called on wavefront t»rwe note that the wavefront is either the 
result of calling procedure Expand in line 4 of Explore-area or the result of calling procedure 
Merge in line 19 of Explore-area. Once wavefront w is split into w , . . ., w n Twe update the 
ordered list L of wavefrontsrand update the current wavefront. 



Split (w, 


L, region, dir) 












1 


split w 


into appropriate wavefronts w 0} . . . , w n 


in 


st 


andard 


order 


2 


remove 


w from ordered list L of wavefronts 












3 


For i = 


: To n 












4 


put 


Wi on ordered list L of wavefronts 












5 


Ifdi 


r = west-to-east 












6 




Then w:= w 












7 




Else w:= w n 













Correctness of the wavefront algorithm 

The following theorems establish the correctness of our algorithm. 

Theorem 8 The algorithm Explore-area expands wavefronts so as to maintain optimal in- 
terruptibility. 

Proof: This is shown by induction on the distance of the wavefronts. The key observations 
are: 

• There is a canonical shortest path from any vertex v to s which goes south whenever 
possiblerbut east or west around obstacles. 

• A wavefront is never expanded beyond a meeting point. 

We show that the algorithm maintains optimal interruptibility by knowing the canonical 
shortest path from any explored vertex to the start vertex s. We refer to this as the shortest 
path property. We show that the algorithm maintains the shortest path property by induction 
on the number of stages in the algorithm. Each stage of the algorithm is an expansion of a 
wavefront . 
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The shortest path property is trivially true when the number of stages k = 1. There is 
initially only one wavefrontTthe start point. Now we assume all wavefronts that exist just after 
the k-th. stage satisfy the shortest path propertyTand we want to show that all wavefronts that 
exist just after the k + 1-st stage also satisfy the shortest path property. 

Consider a wavefront w in the k-th. stage which the algorithm has expanded in the k + 1-st 
stage to w s . We claim that all vertices in w s have shortest path length d[w] + 1. Note that 
any vertex in w s which is directly north of a vertex in w definitely has shortest path length 
d[w] + 1. This is because there is a shortest path from any vertex v to s which goes south 
whenever possiblerbut if it is not possible to go south because of an obstaclerit goes east or 
west around the obstacle. 

The only time any vertex v in w s is not directly north of a vertex in w is when w is expanded 
around the back of an obstacle. This can only occur for a vertex that is either the west-most or 
east-most vertex of a wavefront in the north region. Without loss of generality we assume that 
v is the west-most point on w s and v is on the boundary of some obstacle b. Note that w is 
expanded around the back of an obstacle only when the meeting point is determined. Because 
the algorithm only expands any wavefront until it reaches the meeting point of an obstacler 
vertex v is not to the west of the meeting point. The algorithm knows that v has a shortest 
path from s that goes through v c and along the obstacle to v. Thus the algorithm satisfies the 
shortest path property for the k + 1-st stage. I 

Theorem 9 If the region is not completely explored, there is always a wavefront that is not 
blocked. 

Proof: We consider exploration in the north region. The key observations are: 

• Neighboring wavefronts cannot simultaneously block each other. 

• The east-most wavefront in the north region cannot be blocked by anything to its eastT 
and the west-most wavefront in the north region cannot be blocked by anything to its 
west. 

Thus the robot can always "follow a chain" of wavefronts to either its east or west to find an 
unblocked wavefront. 
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A neighboring wavefront is either a sibling wavefront or an expiring wavefront. An expiring 
wavefront can never block neighboring wavefronts. In order to show that neighboring wavefronts 
cannot simultaneously block each otherrit thus suffices to show next that sibling wavefronts 
cannot block each other. We use this to show that we can always find a wavefront w which 
is not blocked. The unblocked wavefront w nearest in the ordered list of wavefronts L can be 
found by "following the chain" of blocked wavefronts from w to w. By following the chain of 
wavefronts between w and w we mean that the robot must traverse the edges that connect the 
vertices in each wavefront between w and w in L and also the edges on the boundaries of the 
obstacles between these wavefronts. Note that neighboring wavefronts in list L each have at 
least one endpoint that lies on the boundary of the same obstacle. 

Before we show that sibling wavefronts cannot block each other we need the following 
terminology. The first time an obstacle is discovered by some wavefrontTwe call the point that 
the wavefront hits the obstacle the discovery point. (Note that there may be more than one 
such point. We arbitrarily choose one of these points.) In the north regionrwe split up the 
wavefronts adjacent to each obstacle into an east wave and a west wave. We call the set of all 
these wavefronts which are between the discovery point and the meeting point of the obstacle 
in a west-to-east manner the west wave. We define the east wave of an obstacle analogously. 

The discovery point of an obstacle b is always at the front of b. The wavefront that hits 
at b is split into two wavefronts rone of which is in the east wave and one of which is in the 
west wave of the obstacle. We claim that a descendent wavefront Wi in the west wave and 
a descendant wavefront w 2 in the east wave cannot simultaneously block each other. Assume 
that the algorithm is trying to expand Wi but that wavefront w 2 blocks w±. Wavefront w 2 can 
only block Wi if one of the following two cases applies. In both casesIVe show that Wi cannot 
also block w 2 . 

Case 1: Wavefront Wi is about to expand to the back of obstacle &rbut both of the 
back corners of obstacle b have not been exploredrand thus the meeting point has not 
been determined. Wavefront w 2 can only be blocked by Wi if w 2 is either already at the 
meeting point of the obstacle or about to expand to the back of the obstacle. Since none 
of the back corners of obstacle b have been exploredr neither of these two possibilities 
holds. Thusrwavefront Wi does not block w 2 . 
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Case 2: Wavefront Wi has reached the meeting point at the back of b. Thereforerboth 
back corners of the obstacle have been explored and Wi is not blocking w 2 . 

We have just shown that if w 2 blocks Wi then Wi cannot also block w 2 . Thusrthe algorithm 
tries to pick w 2 as the nearest unblocked wavefront to w±. HoweverFw 2 may be blocked by its 
sibhng wavefront w 3 on a different obstacle b'. For this caseFwe have to show that this sibling 
wavefront w 3 is not blockedror that its sibling wavefront w A on yet another obstacle b" is not 
blocked and so forth. Without loss of generalityFwe assume that the wavefronts are blocked 
by wavefronts towards the east. Proceeding towards the east along the chain of wavefronts will 
eventually lead to a wavefront which is not blocked — the east-most wavefront in the northern 
region. Fhe east-most wavefront is adjacent to the initial monotone east-north path. Fhereforer 
it cannot be blocked by a wavefront towards the east. 



Theorem 10 The wavefront algorithm is an optimally interruptible piecemeal learning algo- 
rithm for city-block graphs. 

Proof: Fo show the correctness of a piecemeal algorithm that uses our wavefront algorithm 
for exploration with interruptionFwe show that the wavefront algorithm maintains the shortest 
path property and explores the entire environment. 

Fheorem 8 shows by induction on shortest path length that the wavefront algorithm mimics 
breadth-first search. Fhus it is optimally interruptible. 

Fheorem 9 shows that the algorithm does not terminate until all vertices have been explored. 
Correctness follows. I 

Efficiency of the wavefront algorithm 

We now show the number of edges traversed by the piecemeal algorithm based on the wavefront 
algorithm is linear in the number of edges in the city-block graph. 

We first analyze the number of edges traversed by the wavefront algorithm. Note that the 
robot traverses edges when procedures CREATE-MONOTONE-PATHsrExPANDFand Relocate 
are called. In additionrit traverses edges to get back to s between calls to Explore-area. 
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These are the only times the robot traverses edges. ThusLwe count the number of edges 
traversed for each of these cases. In Lemmas 11 to 14rwe analyze the number of edges traversed 
by the robot due to calls of Relocate. Theorem 11 uses these lemmas and calculates the total 
number of edges traversed by the wavefront algorithm. 

Lemma 11 An edge is traversed at most once due to relocations after a wavefront has expired 
(^Relocate in line 13 of Explore-area^). 

Proof: Assume that the robot is in the northern region and expanding wavefronts in a west-to- 
east direction. Suppose wavefront w has just expired onto obstacle b (i.e. Tit is a single vertex 
with all of its adjacent edges explored). The robot now must relocate along obstacle b to its 
neighboring wavefront w' to the east. Note hat w' is also adjacent to obstacle 6Land therefore 
the robot is only traversing edges on the obstacle b. 

Note that at this point of explorationr there is no wavefront west of w which will expire 
onto obstacle b. This is because expiring wavefronts are never blockedrand thus the direction 
of expansion cannot be changed due to an expiring wavefront. SoLwhen a wavefront is split and 
the direction of expansion is west-to-eastLthe robot always chooses the west-most wavefront to 
expand first. ThusLthe wavefronts which expire onto obstacle b are explored in a west to east 
manner. Thus relocations after wavefronts have expired on obstacle b continuously move east 
along the boundary of this obstacle. I 

Lemma 12 An edge is traversed at most once due to relocations after wavefronts have merged 
(^Relocate in line 9 of Merge,). 

Proof: Before a call to procedure MergeL the robot is at the appropriate end vertex of 
wavefront w. Let's assume that the robot is in the northern region and expanding wavefronts 
in a west-to-east direction. Thus the robot is at the west-most vertex of wavefront w. Note that 
wavefront w can be merged with at most two wavefronts Lone at each endLbut only merges with 
the wavefront to the west of w actually cause the robot to relocate. Suppose wavefront w is 
merged with wavefront w' to its west to form wavefront w". ThenLif the resulting wavefront w" 
is unblockedL procedure Relocate is called and the robot must traverse w" to its west-most 
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vertex (i.eTalso the west-most vertex of w'). However rsince wavefront w" is unblockedrw" can 
immediately be expanded and is not traversed again. I 

Lemma 13 At most one wavefront from the east wave of an obstacle is blocked by one or more 
wavefronts in the west wave. At most one wavefront from the west wave is blocked by one or 
more wavefronts in the east wave. 

Proof: Consider the west wave of an obstacle. By the definition of blockingr there are only 
two possible wavefronts in the west wave that can be blocked. One wavefront is adjacent to 
the back corner of the obstacle. Call this wavefront w±. The other wavefront is adjacent to the 
meeting point of the obstacle. Call this wavefront w 2 . 

We first show that if Wi is blocked then w 2 will not be blocked also. Then we also know 
that if w 2 is blocked then Wi must not have been blocked. Thus at most one wavefront in the 
west wave is blocked. 

If Wi is blocked by one or more wavefronts in the east wave then these wavefronts can be 
expanded to the meeting point of the obstacle without interference from w±. That isFwavefront 
Wi cannot block any wavefront in the east waveFand thus there will be no traversals around 
the boundary of the obstacle until the east wave has reached the meeting point. At this pointT 
the west wave can be expanded to the meeting point without any wavefronts in the east wave 
blocking any wavefronts in the west wave. 

Similarlyrwe know that at most one wavefront from the west wave is blocked by one or 
more wavefronts in the east wave. I 

Lemma 14 An edge is traversed at most three times due to relocation after blockage (TvELO- 
CATE in line 30 of Explore-area^). 

Proof: Without loss of generality! 1 we assume that the wavefronts are blocked by wavefronts 
towards the east. Proceeding towards the east along the chain of wavefronts will eventually 
lead to a wavefront which is not blockedrsince the east-most wavefront is adjacent to the initial 
monotone east-north path. 

First we show that any wavefront is traversed at most once due to blockage. Then we show 
that the boundary of any obstacle is traversed at most twice due to blockage. Note that pairs 
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of edges connecting vertices in a wavefront may also be edges which are on the boundaries of 
obstacles. Thus any edge is traversed at most three times due to relocation after blockage. 

We know from Theorem 9 that there is always a wavefront that is not blocked. Assume that 
the robot is at a wavefront w which is blocked by a wavefront to its east. Following the chain of 
wavefronts to the east leads to an unblocked wavefront w' . This results in one traversal of the 
wavefronts. Now this wavefront w' is expanded until it is blocked by some wavefront w". Note 
that wavefront w" cannot be to the west of wTsince we know that the wavefront west of w' is 
blocked by w' . (We show in the proof of Theorem 9 that if Wi blocks w 2 then w 2 does not block 
Wi.) The robot will not move to any wavefronts west of wavefront w' until a descendant of w' 
no longer blocks the wavefront immediately to its west. Once this is the caserthen the west 
wavefront can immediately be expanded. Similarlyrwe go back through the chain of wavefrontsr 
since - as the robot proceeds west - it expands each wavefront in the chain. Thus the robot 
never traverses any wavefront more than once due to blockage. 

Now we consider the number of traversalsrdue to blockageFof edges on the boundary of 
obstacles. As wavefronts expandr their descendant wavefronts may still be adjacent to the 
same obstacles. ThusFwe need to make sure that the edges on the boundaries of obstacles are 
not traversed too often due to relocation because of blockage. We show that any edge on the 
boundary of an obstacle is not traversed more than twice due to relocations because of blockage. 
That isrthe robot does not move back and forth between wavefronts on different sides of an 
obstacle. Femma 13 implies that each edge on the boundary of the obstacle is traversed at 
most twice due to blockage. 

Fhusr since the edges on the boundary of an obstacle may be part of the pairs of edges 
connecting vertices in a wavefront rt he total number of times any edge can be traversed due to 
blockage is at most three. I 

Theorem 11 The wavefront algorithm is linear in the number of edges in the city-block graph. 

Proof: We show that the total number of edge traversals is no more than 15|i?|. Note that when 
the procedures Create-monotone-pathsT ExpandT and Relocate are calledr the robot 
traverses edges in the environment. In additionrthe robot traverses edges in the environment 
to get back to s after exploration of each of the four regions. These are the only times the 
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robot actually traverses edges in the environment. Thusrto calculate the total number of edge 
traversalsrwe count the edge traversals for each of these cases. 

The robot traverses the edges on the monotone paths once when it explores themLand once 
to get back to the start point. This is clearly at most 2\E\ edge traversals. The robot walks 
back to s four times after exploring each of the four regions. Thus the number of edges traversed 
here is at most A\E\. The proof of Lemma 10 implies that the total number of edge traversals 
caused by procedure Expand is at most 2\E\. We now only need to consider the edge traversals 
due to calls to procedure Relocate. 

Procedure Relocate is called four times within Explore-area and Merge. The four calls 
are due to expansion (line 5 of ExPLORE-AREA)rexpiring (line 13 of ExPLORE-AREA)rmerging 
(line 9 of Merge) and blocking (line 30 of Explore-area). Relocations after expanding a 
wavefront results in a total of \E\ edge traversals. Lemma 11 shows that edges are traversed 
at most twice due to expiring wavefronts. Lemma 12 shows that edges are traversed at most 
once due to relocations after merges. FinallyLLemma 14 shows that edges are traversed at most 
three times due to relocations after blockage. Thus the total number of edge traversals due to 
calls of procedure Relocate is at most 1\E\. 

Thus the total number edges traversed by the wavefront algorithm is at most 15|i?|. A more 
careful analysis of the wavefront algorithm can improve the constant factor. I 

Theorem 12 A piecemeal algorithm based on the wavefront algorithm runs in time linear in 
the number of edges in the city-block graph. 

Proof: This follows immediately from Theorem 10 and Theorem 11. I 

3.6.3 The ray algorithm 

We now give another efficient optimally interruptible search algorithmLcalled the ray algorithm. 
The ray algorithm is a variant of DFS that always knows a shortest path back to s. This thus 
yields another efficient piecemeal algorithm for searching a city-block graph. This algorithm is 
simpler than the wavefront algorithmLbut may be less suitable for generalizationL because it 
appears more specifically oriented towards city-block graphs. 
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The ray algorithm also starts by finding the four monotone pathsFand splitting the graph 
into four regions to be searched separately. The algorithm explores in a manner similar to 
depth-first searchrwith the following exceptions. Assume that it is operating in the northern 
region. The basic operation is to explore a northern-going "ray" as far as possibleFand then 
to return to the start point of the ray. Along the wayTside-excursions of one-step are made to 
ensure the traversal of east-west edges that touch the ray. Optimal interruptibility will always 
be maintained: the ray algorithm will not traverse a ray until it knows a shortest path to s from 
the base of the ray (and thus a shortest path to s from any point on the rayTby Lemma 6). 

The high-level operation of the ray algorithm is as follows. (See Figure 3.11.) From each 
point on the (horizontal segments of the) monotone paths bordering the northern regionFa 
north-going ray is explored. On each such rayr exploration proceeds north until blocked by an 
obstacle or the boundary of the city-block graph. Then the robot backtracks to the beginning 
of the ray and starts exploring a neighboring ray. As described so farFeach obstacle creates 
a "shadow region" of unexplored vertices to its north. These shadow regions are explored as 
follows. Once the two back corners of an obstacle are exploredrthe shortest paths to the vertices 
at the back of an obstacle are then known; the "meeting point" is then determined. Once the 
meeting point for an obstacle is knownrthe shortest path from s to each vertex on the back 
border of the obstacle is known. The robot can then explore north-going rays starting at each 
vertex at the back border of the obstacle. There may be further obstacles that were all or 
partially in the shadow regions; their shadow regions are handled in the same manner. 

We note that not all paths to s in the "search tree" defined by the ray algorithm are 
shortest paths; the tree path may go one way around an obstacle while the algorithm knows 
that the shortest path goes the other way around. Howeverrthe ray algorithm is nonetheless 
an optimally interruptible search algorithm. 

Theorem 13 The ray algorithm is a linear-time optimally interruptible search algorithm that 
can be transformed into a linear-time piecemeal learning of a city-block graph. 

Proof: This follows from the properties of city-block graphs proved in Section 3.6.1Tand the 
above discussion. In the ray algorithm each edge is traversed at most a constant number 
of times. The linearity of the corresponding piecemeal learning algorithm then follows from 
Theorem 7. I 
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Figure 3.11: Operation of the ray algorithm. 

3.7 Piecemeal learning of undirected graphs 

For piecemeal learning of arbitrary undirected graphsFwe again turn our attention to breadth- 
first search. As we mentioned ear lierr standard BFS is efficient only when when the robot can 
efficiently switch or "teleport" from expanding one vertex to expanding another. In contrastlbur 
model assumes a more natural scenario where the robot must physically move from one vertex 
to the next. We change the classical BFS model to a more difficult teleport-free exploration 
modeirand give efficient approximate BFS algorithms where the robot does not move much 
further away from s than the distance from s to the unvisited vertex nearest to s. The teleport- 
free BFS algorithms we present never visit a vertex more than twice as far from s as the nearest 
unvisited vertex is from s. 

Our techniques for piecemeal learning of arbitrary undirected graphs are inspired by the work 
of Awerbuch and Gallager [6r7]. We observe that our learning model bears some similarity to 
the asynchronous distributed model. This similarity is surprising and has not been explored in 
the past. 

Our main theorem for piecemeal learning of arbitrary undirected graphs is: 

Theorem 14 Piecemeal learning of an arbitrary undirected graph G = (V, E) can be done in 
time 0(E + V 1+o(i y). 

Proof: Following the Recursive-Strip algorithmr given in Section 3.7.3rthe robot always 
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knows a path from its current location back to the start vertex of length at most the radius 
of the graph. Thus Recursive-Strip is efficiently interruptible. The running time of this 
algorithm is 0(E + V2°^ losV1 ° slosV) ) = 0(E + V 1+o( -^). By Theorem 7Ithis algorithm can be 
interrupted efficiently to give a piecemeal learning algorithm with running time 0(E-\-V l+0 ^). 

■ 

In the remainder of this sectionFwe give three algorithms for piecemeal learning undirected 
graphs. We first give a simple algorithm that runs in 0(E + V 15 ) time. We then give a 
modification of this algorithm that runs in 0((E + T /rl5 )logT /r ) time. Although this algorithm 
has slightly slower running timer we are able to make it recursiver giving a third algorithm 
with almost linear running time: it achieves 0(E + V l+0 ^) running time. The most efficient 
previously known algorithm has 0(E + V 2 ) running time. 

3.7.1 Algorithm STRIP-EXPLORE 

This section describes an efficiently interruptible algorithm for undirected graphs with running 
time 0(E + V 15 ). It is based on breadth-first search. 

A layer in a BFS tree consists of vertices that have the same shortest path distance to the 
start vertex. A frontier vertex is a vertex that is incident to unexplored edges. A frontier vertex 
is expanded when the robot has traversed all the unexplored edges incident to it. 

The traditional BFS algorithm expands frontier vertices layer by layer. In the teleport- 
free modeirthis algorithm runs in time 0(E + rV)rsince expanding all the vertices takes time 
0(E)T&nd visiting all the frontier vertices on layer i can be performed with a depth-first search 
of layers 1 . . .i in time 0(V)rand there are at most r layers. The procedure Focal-BFS 
describes a version of the traditional BFS procedure that has been modified for our teleport- 
free BFS model in two respects. FirstTthe robot does not relocate to frontier vertices that have 
no unexplored edges. Secondrit only explores vertices within a given distance-bound L of the 
given start vertex s. (The first modificationr while seemingly straightforwardris essential for 
our analysis of Strip-Explore which uses Local-BFS as a subroutine.) A procedure call of 
the form Local-BFS(s, r)rwhere s is the start vertex of the graph and r is its radiusFwould 
cause the robot to explore the entire graph. 

Awerbuch and Gallager [6F7] give a distributed BFS algorithm which partitions the network 



72 Piecemeal learning of unknown environments 



in stripsT where each strip is a group of L consecutive layers. (Here L is a parameter to be 
chosen.) All vertices in strip i — 1 are expanded before any vertices in strip i are expanded. 
Their algorithms use as a subroutine breadth-first type searches with distance L. 
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1 


For i = To L - 1 Do 


2 


let verts = all vertices at distance i from s 


3 


For each u Averts Do 


4 


If u has any incident unexplored edges 


5 


Then 


6 


relocate to u 


7 


traverse each unexplored edge 


8 


incident to u 


9 


relocate to s 



Our algorithmrSTRlP-ExPLORErsearches in strips in a new way. See Figure 3.12. The robot 
explores the graph in strips of width L. First the robot does Local-BFS(s, L) to explore the 
first strip. It then explores the second strip as follows. Suppose there are k frontier vertices 
Vi,v 2 , ■ ■ -,v k in layer L; each such vertex is a source vertex for exploring the second strip. A 
naive way for exploring the second strip is for the robot for each irto relocate to f 8 Tand then 
find ah vertices that are within distance L of f, by doing a BFS of distance-bound L from v, 
within the second strip. The robot thus traverses a forest of k BFS trees of depth ircompletely 
exploring the second strip. The robot then has a map of the BFS tree of depth L for the first 
strip and a map of the BFS forest for the second stripr enabling it to create a BFS tree of 
depth 2L for the first two strips. The robot continuesr strip by stripFuntil the entire graph is 
explored. 

The naive algorithm described above is inefficient Tdue to the overlap between the trees in the 
forest at a given leveircausing portions of each strip to be repeatedly re-explored. The algorithm 
Strip-Explore presented below solves this problem by using the Local-BFS procedure as 
the basic subroutinerinstead of using a naive BFS. (See Figure 3.12.) 
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Figure 3.12: In the naive algorithmrthe shaded areas are retraversed completely. In Strip- 
ExPLORETthe shaded areas are passed through more than once only if necessary to get to 
frontier vertices. 
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In STRlP-ExPLORErthe robot searches in a breadth-first mannerrbut ignores previously 
explored territory. The only time the robot traverses edges that have been previously explored 
is when moving to a frontier vertex it is about to expand. This results in retraversal of some 
edges in previously explored territoryrbut not as many as in the naive algorithm. 



Theorem 15 Strip-Explore runs in 0(E + V 



1.5\ 



time. 



Proof: First we count edge traversals for relocating between source vertices for a given strip. 
For these relocationsrthe robot can mentally construct a tree in the known graph connecting 
these verticesrand then move between source vertices by doing a depth-first traversal of this 
tree. Thus the number of edge traversals due to relocations between source vertices for this 
strip is at most 2V. Since there are \r/L~\ stripsrthe total number of edge traversals due to 
relocations between source vertices is at most [~-^-]2V < (j- + l) IV = ^- + 2V. 

Now we count edge traversals for repeatedly executing the Local-BFS algorithm. First! 1 
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Figure 3.13: Contrasting BFS and Local-BFS: Consider a BFS of depth 5 from SiTfollowed 
by a BFS of depth 5 from s 2 . (The depth of the strip is L = 5.) The BFS from s 2 revisits 
vertices a,b,c,d,e. On the other handrif the BFS from Si is followed by a Local-BFS from 
s 2 rthen it only revisits d, c, e. After edge (/, d) is foundlVertex e is a frontier vertex that needs 
to be expanded. 

for the robot to expand all vertices and explore all edgesrit traverses 2E edges. NextTeach 
time the relocate in line 9 of procedure Local-BFS is calledrat most L edges are traversed. 
To account for relocations in line 6 of procedure LocAL-BFSTwe use the following scheme for 
"charging" edge traversals. Say the robot is within a call of the Local-BFS algorithm. It has 
just expanded a vertex u and will now relocate to a vertex v to expand it. Vertex v is charged 
for the edges traversed to relocate from u to v . (We are only considering relocations within the 
same call of the Local-BFS algorithm; relocations between calls of the Local-BFS algorithm 
were considered above.) Source vertices are not charged anything. Moreoverrthe robot can 
always relocate from u to v by going from u to the source vertex of the current local BFSTand 
then to tTtraversing at most 2L edges. ThusFeach vertex is charged at most 2L when it is 
expanded. Local-BFS never relocates to a vertex v unless it can expand vertex v (i.e.Tunless 
v is adjacent to unexplored edges). ThusFall relocations are charged to the expansion of some 
vertexrand the total number of edge traversals due to relocation is at most 2LV. 

Thus the total number of edge traversals is at most 2rV/L + 2V + 3LV + 2ET which is 
0(rV/L + LV + E). When L is chosen to be -yrTthis gives 0(E + V 15 ) edge traversals. I 

Procedure STRlP-ExPLOREFand the generalizations of it given in later sectionsFmaintain 
that A < 26 at all times — the robot never visits a vertex more than twice as far from s as the 
nearest unvisited vertex is from s. The worst case is while exploring the second strip. 
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3.7.2 Iterative strip algorithm 

We now describe iTERATiVE-STRipTan algorithm similar to the Strip-Explore algorithm. 
It is an efficiently interruptible algorithm for undirected graphs inspired by Awerbuch and 
Gallager's [6] distributed iterative BFS algorithm. Although its running time of 0((V 15 + 
E)\ogV) is worse than the running time of STRlP-ExPLORErits recursive version (described in 
Section 3.7.3) is more efficient. (It is not clear how to recursively implement Strip-Explore 
as efficientlyrbecause the trees in a strip are not disjoint.) 

With Iterative- STRipTthe robot grows a global BFS tree with root s strip by striprin a 
manner similar to Strip-Explore. Unlike STRiP-ExPLORErhere each strip is processed several 
times before it has correctly deepened the BFS tree by y/r. We next explain the algorithm's 
behavior on a typical strip by describing how a strip is processed for the first timeFand then 
for the remaining iterations. 
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For i = 1 To yjr Do 

For each source vertex u in strip i Do 
relocate to u 

BFS from u to depth y/r, but do not enter previously 
explored territory 
While there are any active connected components Iterate 
For each active connected component c Do 
Repeat 

let vi, v 2} v 3} . . . be active frontier vertices 
exclusively in c with smallest depth among 
active frontier vertices in c 
relocate to each of v^, v 2} v 3} . . ., and expand 
Until no more active frontier vertices exclusively in c 
determine new and active connected components 



In the first iterationFa strip is explored much as in Strip-Explore. Fhe robot explores 
a tree of depth y/r from each source vertexrby exploring in breadth-first manner from each 
source vertexFwithout re-exploring previous trees. Whenever the robot finds a collision edge 
connecting the current tree to another tree in the same striprit does not enter the other tree. 
Unlike STRlP-ExPLORErthe robot does not traverse explored edges to get to the active frontier 
vertices on other trees. Fhereforer after the first iterationrthe trees explored are approximate 
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BFS trees that may have frontier vertices with depth less than ^/r from some source vertex. 
These vertices become active frontier vertices for the next iteration. Thusrthe current strip 
may not yet extend the global BFS tree by depth y^rso more iterations are needed until all 
frontier vertices are inactive and the global BFS tree is extended by depth ^/r (see Figure 3.14). 



active 

frontier 

vertices 




depth D 



Figure 3.14: The iterative strip algorithm after the first iteration on the fourth strip. Two 
connected components Ci,c 2 have been explored. The collision edges e x and e 2 connect the 
first three approximate BFS trees. The dashed line shows how source vertices s 1 ,s 2 ,s 3 connect 
within the strip. There are three active frontier vertices with depth less than D + yfr. 



In the second iteration (see Figure 3.15)rthe robot uses the property that two trees connected 
by a collision edge form a connected component within the strip. (The graph to be explored is 
connectedrand thus forms one connected component; but we refer to connected components of 
the explored portion of the graph contained within the strip.) The robot need not traverse any 
edges outside the current strip to relocate between these active frontier vertices in the same 
connected component. In the second and later iterationsrthe robot works on one connected 
component at a time. 

The robot explores active frontier vertices in one connected component as follows. It com- 
putes (mentally) a spanning tree of the vertices in the current strip. This spanning tree lies 
within the strip. Let d be the least depth of any active frontier vertex in the component from a 
source vertex. It visits the vertices in the strip in an order determined by a DFS of the spanning 
tree. As it visits active frontier vertices of depth dTit expands them. It then recomputes the 
spanning tree (since the component may now have new vertices) and again traverses the treer 
expanding vertices of the appropriate next depth d'. Traversing a collision edge does not add 
the new vertex to the treeFsince this vertex has been explored before. This process continues 
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Figure 3.15: The iterative strip algorithm after the second iteration. Now the circled vertices 
which were active frontier vertices at the beginning of the iteration are expanded. One of the 
expansions resulted in a collision edge. Now the strip consists of only one connected component 
(shaded area). There are six frontier vertices which become source vertices of the next strip. 
All frontier vertices have depth D + yfr. 



(at most y/r times) until no active frontier vertex in the connected component has distance less 
than y/r from some source vertex in the component. 

The robot handles each connected component in turnras described above. In the next 
iteration it combines the components now connected by collision edgesFand explores the new 
active frontier vertices in these combined components. Lemma 15 states that at most log V 
iterations cause all frontier vertices to become not active. That isFall frontier vertices are depth 
y/r from the source vertices of this strip. These frontier vertices are the new sources for the 
next strip. 

Lemma 15 At most log V iterations per strip are needed to explore a strip and extend the 
global BFS tree by depth y/r. 

Proof: If there are initially / source verticesrthen after the first iteration there are at most / 
connected components. If a component does not collide with another active componentTthen 
it will have no active frontier vertices for the next iteration. The only active components in 
the next iteration are those that have collided with other componentsFand thusFeach iteration 
halves the number of components with active frontier vertices. After at most log V iterations 
there is no connected component with active frontier vertices left. The robot then has a complete 
map of the current strip and of the global BFS tree built in previous stripsFso it can combine 
this information and extend the global BFS tree by depth y/r. I 
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Theorem 16 Iterative-Strip runs in time 0((E + V )logV). 

Proof: We first count the number of edge traversals within a strip. Let V; and E, be the 
number of vertices and edges explored in strip i. For each componentT vertices of distance t 
from some source vertex are expanded by computing a spanning tree of the componentTdoing 
a DFS of the spanning treeFand expanding all vertices of distance t from some source vertex 
(line 9). At each iteration (line 5)rcomponents are disjointTso relocating to all vertices in the 
strip of distance exactly t takes at most 0(Vi) edge traversals. Fhusrin one iterationFrelocating 
to all vertices in the strip within distance y/r takes at most 0(y/rVi) edge traversals. Moreoverr 
note that in order for the robot to expand each vertexrit traverses at most 0{Ei) edges. Fhusr 
the total number of edge traversals for strip i in one iteration is 0(Ei + y/rVi). Combining this 
with Lemma 15rthe total number of edge traversals within strip i to completely explore strip i 
takes 0((Ei + y/rV^logV) edge traversals. 

Now we count edge traversals for relocating between source vertices in strip i. As in the 
proof of Fheorem 15rin each iteration the robot traverses at most 2V edges to relocate between 
source vertices. Since there are at most logF iterationsrthis results in 2T /r logT /r edge traversals 
between source vertices to explore strip i. Fhusrthe total number of edge traversals to explore 
strip i is 0((Ei + y/rVi) log V + 2T /r logT /r ). Summing over the y/r disjoint strips gives 0((E + 
y/rV) log V + 2TV^log V) = 0((E + y^V) log V) = 0((E + V 15 ) log V). ■ 

3.7.3 A nearly linear time algorithm for undirected graphs 

Fhis section describes an efficiently interruptible algorithm RECURSlVE-STRlpFwhich gives a 
piecemeal learning algorithm with running time 0(E + V l+0 ^). Recursive-Strip is the 
recursive version of Iterative- Strip; it provides a recursive structure that coordinates the 
exploration of stripsFof approximate BFS treesFand of connected components in a different 
manner. Fhe robot stihThoweverrbuilds a global BFS tree from start vertex s strip by strip. 
Fhe robot expands vertices at the bottom level of recursion. 

In RECURSlVE-STRlprthe depth of each strip depends on the level of recursion (see Fig- 
ure 3.16). If there are k levels of recursionrthen the algorithm starts at the top level by splitting 
the exploration of G into r/d k _i strips of depth d k _i. Each of these strips is split into d k _i/d k _ 2 
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Figure 3.16: The recursive strip algorithm processing an approximate BFS tree from source 
vertex s 2 to depth d k _i = L. Recursive calls within the tree are of depth d k _ 2 = L' . 



searches of strips of depth d k _ 2 Tetc. We have r = d k > d k _i > . . . > di > d = 1. 

Each recursive call of the algorithm is passed a set of source vertices sourcesTthe depth to 
which it must explorer and a set T of all vertices in the strip already known to be less than 
distance depth from one of the sources. The robot traverses all edges and visits all vertices 
within distance depth of the sources that have not yet been processed by other recursive calls 
at this level. Recursive-Strip({s}, r, {s}) is called to explore the entire graph. 

At recursion level irthe algorithm divides the exploration into strips and processes each strip 
in turnras follows. Suppose the strip has / source vertices Vi, . . . ,t>/. The strip is processed in 
at most log/ = O(logV) iterations. In each iterationrthe algorithm partitions T into maximal 
sets Ti,T 2 , . . .,T k such that each set is known to be connected within the strip. Let S c denote 
the set of source vertices in T c . A DFS of the spanning tree of the vertices T gives an order for 
the source vertices in Si, S 2 , ■ ■ ■ , S k ; this spanning tree is used for efficient relocations between 
these source vertices. Note that all source vertices are known to be connected through the 
spanning tree of the vertices in TTbut they might not be connected within the substrips. Since 
relocations between the vertices in S c in the next level of recursion use a spanning tree of T C T 
for efficiency the vertices of T c must be connected within the strip. After partitioning the 
vertices into connected components within the striprfor each connected component T c rthe 
robot relocates (along a spanning tree) to some arbitrary source vertex in S c . It then calls the 
algorithm recursively with 5 c rthe depth of the strip]?and the vertices T c which are connected 
to the sources S c within the strip. 
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Recursive- Strip (sources, depth, T) 




1 


If depth = 1 




2 


Then 




3 


let vi, v 2} ■ ■ . , Vk be the depth-first ordering 
in spanning tree 


of sources 


4 


For i = 1 To k Do 




5 


relocate to Vi 




6 


If Vi has adjacent unexplored edges 




7 


Then traverse v^s incident edges 
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T = T U {newly discovered vertices} 
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Return 




10 


Else 




11 


determine next depth 




12 


number- of -strips <— depth/next- depth 




13 


For i = 1 To number- of-strips Do 




14 


determine set of source vertices 




15 


For j ' = 1 To number- of -iterations Do 




16 


partition vertices in T into maximal sets 


T\ , T 2 , . . . , Tk 




such that vertices in each T c are known to be 




connected within strip i 




17 


For each T c in suitable order Do 




18 


let S c be the source vertices in T c 




19 


relocate to some source s £ S c 




20 


RECURSIVE-STRIP(5' C , next-depth, 


T c ) 


21 


T = T U T c 




22 


relocate to some s £ sources 




23 


Return 
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The remaining iterations in the strip combine the connected components until the strip is 
finished. Then the robot continues with the next strip in the same level of recursion. OrLif it 
finished the last striprit relocates to its starting position and returns to the next higher level 
of recursion. 

Theorem 17 Recursive-Strip runs in time 0(E + V 1+o(1 '>). 

Proof: At a particular call of RECURSlVE-STRlpLthere are 4 places the robot traverses edges: 

1. expansion of vertices in line 7 

2. relocating to sources in lines 5 and 19 

3. relocations due to recursive calls in line 20 

4. relocation back to a beginning source vertex in line 22 

We count edge traversals for each of these cases. First we give some notation. We consider 
the top level of recursion to be a level- A; recursive cahTand the bottom level of recursion to 
be a level-0 recursive call. For a particular level- i call of RECURSlVE-STRlpriet C; denote the 
number of edge traversals due to relocationsFand let E, denote the number of distinct edges 
that are traversed due to relocation. Let V; denote the number of vertices incident to these 
edges and whose incident edges are all known at the end of this call. Let pi be a uniform upper 
bound on C'i/Vi. ThusLif the depth of recursion is k then the total number of edge traversals 
is bounded by 0(V pu)- 

First we observe that each vertex is expanded at most onceLso there are at most 0{E -\-V) 
edge traversals due to exploration at line 7 in the bottom level of recursion. 

For a level- i callF we count the number of edge traversals for relocation between source 
vertices (lines 5 and 19). Since all the source vertices in the call are connected by a tree of 
size 0(V^)Frelocating to all source vertices at the start of one strip takes 0(Vi) edge traversals. 
With di/di_i strips and logF iterations per stripL there are V^Tog V-^- edge traversals for 
relocations between source vertices. 

We now count traversals for recursive calls (line 20) within a level- i call. Note that our 
algorithm avoids re-exploring previously explored edges. ThusLfor a level- i callLwhen working 
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on a particular strip ITfor each iteration within this striprthe sets of vertices whose edges are 
explored in each recursive call are disjoint. Suppose thatTin this striprin one iteration the 
procedure makes k recursive cailsFeach at level i — 1. Then let C^Tl < j ' < kY denote the 
number of edge traversals due to relocations resulting from the j'-th recursive cahTand let V^-_{ 
denote the number of vertices adjacent to these edges. Furthermoreriet V^ denote the number 
of vertices which are in strip / of this procedure call at recursion level i. Then we would like first 
to calculate ^j=i C^r which is the number of edge traversals due to relocation in recursive 
calls in one iteration within this strip. This is at most J2j=i Pi-iu-i = Pi-i J2j=i u-i- Since 
the recursive calls are disjointr^\ =1 V- J _\ = Vj^Tand thus the number of edge traversals due to 
relocations in recursive calls in one iteration within this strip is at most Pi-iV;^-. Finallyr since 
there are log V iterations in each stripFand ah strips are disjoint from each otherrthe number 
of edge traversals due to recursive calls is at most pi_iVilogV . 

Finallyr note that we relocate once at the end of each procedure call of Recursive-Strip 
(see line 22). This results in at most V; edge traversals. 

Thusrthe number of edge traversals due to relocation (not including relocations for expand- 
ing vertices) is described by the recurrence C\ < VilogV-^ 1 — \- pi-iV(logV + V(. Normalizing 
by FT we get the following recurrence: 



Pi 



di 



Pi-i io g y + 0(1) 



Solving the recurrence for p k gives: 



Pk < 



< 



d k 


dk-i 


d k 



dk-i 



logF 
logF 



d -^\ log 2 V 
dk-2 

dk-i 



k-2 



\oefV 



\og k V + p \og k V + Y. l °£ V 



di 
d 

?±)log«V + 0(log«V) 



k-l 



8 = 



We note that p = 0(l)rsince at the bottom leveirif there are V vertices expandedrthen the 
number of edge traversals due to relocation is 0(V). The product of the first k terms in the 
recurrence is ^-{\ogV) i ' k+1 ^l' 2 = r(\ogV) i ' k+1 ^l' 2 . We choose d k _ 1 , d k _ 2 , ■ ■ ■ by setting each of 
the first k terms equal to the &-th root of this product. (Note that this also specifies how to 
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calculate depth di-\ from depth rf 8 .) SubstitutingFwe get: 

Pk < kr 1/k (log Vf +1)/2 + O(log k V). 

We find the value of k that minimizes this by taking the logarithm and differentiating with 
respect to k. Choosing k = ( i lo f V v ) and simplifying gives us p k < 2° ( V lo s ,/lo g lo g ,/ );r an (i 



, log log V , 

thus C k is at most F2° ( Vi°^°^°s^)rwhich is V 1+o(1 \ Adding the edge traversals for relocation 
to the edge traversals for expansion of vertices gives us 0(E-\- V l+0 ^) edge traversals total. I 



3.8 An Application to Treasure Hunting 

We now consider an application of our algorithms to the problem of finding a treasure (or a lost 
childror a particular landmark) in an unknownrpotentially infinite graph G = (V,E). If the 
robot searching for the treasure knows that the treasure is close to its start locationrit should 
explore in a manner such that it does not get too far away from this location. 

We give the procedure TREASURE-SEARCHTwhich uses the Recursive-Strip algorithm as 
a subroutine. If the treasure is distance S T away from the source vertexlthis algorithm maintains 
the condition that the robot is never further from the source than Ar where A < S T + o(S T ). 
Following procedure TREASURE-SEARCHTthe robot traverses 0(E + V l+o{ly ) edgesFwhere E 
and V are the total number of distinct edges and vertices within radius A from the source. 

Fhe robot explores the graph for the treasure in phases. In each phaserthe size of the strip to 
be explored changes. Fhe change at phase i depends on e t = 1/yi. Initiallyrthe robot explores 
the graph out to distance r\ = 1 + e^ NextTthe robot extends its exploration by a factor of 
1 + e 2 - Fhat isrthe size of the next strip is (1 + £i)(l + £2) — (1 + ei)Tand at the end of the second 
phaserthe robot has learned the graph out to distance r 2 = (1 + ei)(l + e 2 ). After extending 
the next striprthe robot has learned the graph out to distance r 3 = (1 + £i)(l + e 2 )(l + e 3 )T 
and so on. In each phase irthe robot initially calls Recursive-Strip from each of the source 
vertices (vertices at distance Ti_i). When the robot finds collision edgesrit does not re-explore 
edges. FhusFwithin each phaserit may take up to log V iterations (as in Iterative-Strip and 
Recursive-Strip) before it has explored the entire strip. 
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Treasure-Searches) 
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RECURSIVE-STRIP({s}, ri, {s}) 
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Else 
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let T be be the set of source vertices distance 
r 8 _i away from s 


12 






For J = 1 To number- of -iterations Do 
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partition vertices in T into maximal sets Ti, . . . , T^ 
such that vertices in each T c are known to be 
connected within strip i 


14 






For each T c in suitable order Do 


15 






let S c be the source vertices in T c 


16 






relocate to some source s £ S c 


17 






RECURSIVE-STRIP(5' C , next-depth, T c ) 


18 






T = T U T c 



Lemmas 16 and 17 bound the number of phases in the Treasure-Search procedure. Using 
Lemma 16Lwe can show that the robot does not get too far away from the source vertexLand 
using Lemma 17Lwe can bound the number of edges the robot traverses. 

Lemma 16 The number of phases in Treasure-Search is at leastlogS T . 

Proof: Since €i > e 2 > e 3 . . Twe know thatLfor any j'L(l + ei)(l + e 2 ) ■ ■ ■ (1 + e/) < (1 + £i) J - 
ThusLif we let j be the smallest number such that (1 + e^ > ^ T Lthen we know that the 
number of phases i to reach the treasure at S T is at least j. Since €i = lLwe have 2 J > ^ T Lor 
j > log S T . ■ 



Lemma 17 The number of phases in Treasure-Search is at most 4 In S T + 1. 
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Proof: A treasure at depth S T = 1 is found in the first phaseLso we consider only S T > 1. We 
know that for any jT(l + 6j) J < (l + e 1 )(l + e 2 ) • • -(l + e j)- ThusLif (l + 6j) J > <*) T Lwe know that the 
number of phases i is at most j. So we show the lemma by showing that (l + e 41n 2^ T ) 41n St > S T . 
Equivalentlyrwe would like to show 4 In S T ln(l + e 41n 2 St ) = 4 In S T ln(l + 21n 1 ^ ) > In S T . 

For \x\ < llusing a Taylor expansionLwe have ln(l + a;) = x — ^f + ^f — ^r + ' ' '■ For < x < IT 
wehaveln(l + a; )> a; -^. So 41n 2 S T ln(l + ^) > (41n 2 6 T )(^ - ^^) = 2lnS T - 1/2T 
which is at least ln^ T for S T > 2. I 



Theorem 18 The robot is never further than S T + 6 T / \J\og6 T from the source vertex. 

Proof: Let A be the furthest distance the robot gets from the source vertex. Let i be the 
number of phases that need to be explored to get out to depth S T . ThenLA — S T is at most 
the depth of the strip in the i-th phase. That isL A — S T < (1 + £i)(l + £2) • • •(! + e i) — 
(1 + ei)(l + e 2 ) • • •(! + e 8 _i) = (1 + ei)(l + e 2 ) • • •(! + ^i-i)^i < fati- Lemma 16 shows that 
the total number of strips explored is at least rog<*) T . ThusL e, is at most l/y4og^ T Land 



A < St + ^ /\/log S T = S T + o(S T )- B 

Theorem 19 Procedure Treasure-Search traverses at most 0(E -\-V l+0 ^) edges, where E 
and V are the total number of distinct edges and vertices within radius A from the source. 

Proof: Since the edges in the different phases are disjointLthe number of edges traversedL 
ignoring relocations between source vertices in line 16Lis at most 0(E + V l+0 ^). To get 
between source vertices in line 16La spanning tree of the known vertices can be used. (Note 
that for recursive calls of RECURSlVE-STRlpLthe algorithm relocates between source vertices 
using the vertices connected within the appropriate strip.) By Lemma 17Lwe know the number 
of phases is at most 4 In <*) T + lLand in each phase it may take up to log V iterations to explore 
the entire strip. Thus there are an additional (4 In S T + l)T /r logV / edge traversals due to 
relocations between source verticesLand this gives a total of 0(E + V l+0 ^) edge traversals for 
the entire Treasure-Search procedure. ■ 
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3.9 Conclusions 

We have presented an efficient 0(E + V l+0 ^) algorithm for piecemeai iearning of arbitraryr 
undirected graphs. For the speciai case of city-biock graphsFwe have given two iinear time 
aigorithms. We ieave as open probiems finding iinear time aigorithms (if they exist) for the 
piecemeal learning of: 

• grid graphs with non-convex obstacles]! 1 

• other tesselationsrsuch as triangular tesselations with triangular obstaclesFand 

• more general classes of graphsFsuch as the class of planar graphs. 

• arbitraryTundirected graphs 



Chapter 4 



Learning-based algorithms for 
protein motif recognition 



4.1 Introduction 

One of the most important problems in computational biology is that of predicting how a 
protein will fold in three dimensions when we only have access to its one- dimensional amino acid 
sequence. Structure prediction has practical importanceras the biological function of a protein 
depends upon its structure or fold. Unfortunatelyrdetermining the three-dimensional structure 
of a protein is very difficult. Experimental approaches such as NMR and X-ray crystallography 
are expensive and time-consuming (they can take years)Tand often do not work at all. Thereforer 
computational techniques that predict protein structure based on already available sequence 
data can help speed up the understanding of protein functions. 

An important first step in tackling the protein folding problem is a solution to the structural 
motif recognition problem: given a known local three-dimensional structureTor mo^i/Tdetermine 
whether this motif occurs in a given amino acid sequenceTand if sorin what positions. In this 
chapterrwe focus on a special type of a-helical motifT known as the coiled coil motif (see 
section 4.2)Talthough the techniques presented can be applied to other motifs as well. 

Most approaches to the motif recognition problem work only for motifs that are already well- 
studied — that isrthey are known to occur in many sufficiently diverse proteins. This knowledge 
usually comes from biologists who have studied many examples of the motif. HoweverTthere 
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are many motifs for which only a small subset of examples are knownrand this subset is often 
not rich enough to be representative of the motif. Thusrfor lack of dataFcurrent prediction 
methods ranging from straightforward sequence alignments to more complicated methods such 
as those based on profiles of the motifs often fail to successfully identify such motifs. 

For examplerin the case of the coiled coil motifTmost known instances are 2-stranded coiled 
coils (i.eFcoiled coils consisting of 2 a-helices). As a resultTknown prediction algorithms work 
well for predicting 2-stranded coiled coils [14ri3ri2r42r58r63]rbut do not work as well for the 
related 3-stranded coiled coil motif (i.e.rcoiled coils consisting of 3 a-helices)rdue to the lack 
of known 3-stranded coiled coil sequences. That isrfor 3-stranded coiled coilsrthese algorithms 
have a large amount of overlap between the scores for sequences that do not contain coiled coils 
and sequences that do. 

Our results 

In this chapterrwe use learning theory to improve existing methods for protein structural motif 
recognitionr particularly in the case where only a few examples of the motif are known. Our 
main result is a linear-time learning algorithm that uses information obtained from a database 
of sequences of one motif to make predictions about a related or similar motif. 

The problem we explore can be viewed as a concept learning problemFwhere the algorithm 
is given labeled and unlabeled examplesFand its goal is to find a concept which gives labels 
to all the examples. Unlike many concept learning frameworksrthis problem is not completely 
supervised — this type of learningFwhich we refer to as semi-supervised learningTis often neces- 
sary in real- fife learning problems. We find this to be true in our test domainFwhere our goal is 
to identify sequences that contain coiled coils from a set of protein sequences which may or may 
not contain coiled coils. In particularFwe are interested in recognizing both 2- and 3-stranded 
coiled coils. Unfortunatelyrthe majority of data we have is comprised of 2-stranded coiled coils. 
In additionr although many biologists are interested in 3-stranded coiled coilsr there is little 
well analyzed data available on them. Thusrbecause of the lack of data and current biological 
knowledger supervised learning (i.eTthe algorithm is given a large enough set of examples of 
both 2- and 3-stranded coiled coils on which to train) is not currently feasible for our problemr 
and semi-supervised or even unsupervised learning (with no labeled examples) is the only type 
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of learning which is possible. At first glancerthis learning problem seems like a challenging 
problemrsince we are trying to come up with an algorithm which generalizes the data we have 
for 2-stranded coiled coils to also pick out 3-stranded coiled coils. HoweverFwe show empirically 
that for our test domainFsemi-supervised learning gives excellent results. In particularFwe have 
tested our program and show that our algorithm's performance is substantially better than that 
of previously known algorithms for recognizing 3-stranded coiled coils. 

Our algorithm starts with an original database of a base motif, and the goal is to develop a 
more general database of a target motif, which is related to the base motif in structure. (The 
target motif includes the base motif as a special case.) In other wordsIVe would like to convert 
a good predictor for the base motif into a good predictor for the target motif. Our algorithm 
has the following key features: 

• The algorithm iteratively scans a large database of test sequences to find sequences that 
are presumed to fold into the target motif. The selected sequences are then used to update 
the parameters of the algorithm; these updates affect the performance of the algorithm 
in the next iteration. 

• In each iterationrthe algorithm scores all the sequences based on its current estimates of 
the parameters and the theoretical framework developed in [12]. 

• In each iterationrthe algorithm uses randomness to select which sequences are presumed 
to fold into the target motif. 

• The selected sequences are used in the beginning of the next iteration to update the 
parameters of the algorithm in a Bayesian-like weighting scheme. 

There are several ways in which our iterative algorithm is kept running in a "safe" fashionl 1 
without increasing the false positive rate by incorporating sequences into the final database that 
do not fold into the motif. FirstTwe begin with a mathematically sound scoring subroutinel 1 
that experimentally has a low false positive rate. Secondrour method of computing likelihoods 
ensures that only a certain fraction of all residues are scored as positive examples of the motif 
(see section 4.3). Finallyr while evaluating our programFwe run the program with sequences 
that are known not to contain coiled coilsFand this has helped us determine when the algorithm 
is performing well. 



90 Learning-based algorithms for protein motif recognition 

This methodology does not appear to have been explored much in the biological literature. 
Although a few papers have dealt with iterative algorithms [73r3r46r36]rthey do not use 
randomness and weighting for updating of parameters. In our experienceFwe find that these 
components of the algorithm are critical to achieving good performance. 

Implementation results 

In order to demonstrate the efficacy of our methodsFwe test them on the domain of 2- and 
3-stranded coiled coils (see section 4.4). 

FirstlVe show how to use our methods to recognize 3-stranded coiled coils given examples of 
2-stranded coiled coils. In other wordsFstarting with a base motif of 2-stranded coiled coilsFwe 
learn the target motif comprising of 2- and 3-stranded coiled coils. The initial predictor already 
has good performance on 2-stranded coiled coilsFso we test our algorithm by its performance 
on 3-stranded coiled coils. 

We evaluate our algorithm on 3-stranded coiled coils with respect to two statistical cross 
validation tests: the "leave one out" test and the "leave half out" test. In the first scenarior 
the algorithm starts with data from the 2-stranded coiled coil databaseFand iterates on a test 
set that contains sequences which are known to form 3-stranded coiled coilsr sequences which 
are thought to form 3-stranded coiled coilsr sequences for which no structural information is 
availabler and sequences which are known not to contain coiled coils. The category of each 
sequence in this test set is not known to the algorithmFand the sequences which do not contain 
coiled coils are given to the algorithm in order to test its robustness. At the end of the procedurer 
the algorithm is evaluated by the number of the 3-stranded coiled coil sequences which it 
recognizes. Each time a sequence that is present in the database the algorithm is building is 
scoredrit is removed from that database to avoid the possibility of unfairly biasing the test. In 
this scenarioFwe find that our algorithm greatly enhances the recognition of 3-stranded coiled 
coilsFwithout affecting its performance on sequences that are known not to contain coiled coils. 
In particularFwe are able to select 93% of the sequences that are conjectured by biologists to 
contain coiled coilsFwith no false positives out of the 286 sequences known not to contain coiled 
coils. Previouslyrthe best performance without false positives is 67%. 

We also test our algorithm on 3-stranded coiled coils in a much more difficult scenario. 
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In particulaiT instead of cross validating our procedure by leaving out just one sequence at 
a time when testingrthe algorithm iterates on test sequences that contain only half of the 
sequences known to form 3-stranded coiled coils. It is then evaluated by its performance on 
the 3-stranded coiled coil sequences that are not iterated upon. In this scenarioFwe also find 
improved performance. The 3-stranded coiled coil sequences are split in half 3 timesFand on 
averager the algorithm is able to select 85% of the left out 3-stranded coiled coil sequences! 1 
with likelihood scores higher than that of the highest scoring negative sequence. On averager 
the previous best performance without false positives is 67%. 

Finallyrwe test our program on subfamilies of 2-stranded coiled coils using the leave one 
out criterion. For 2-stranded coiled coilsFwe have a good data set consisting of a diverse set 
of sequences. Howeverrto test our programFwe simulate a limited data problem by testing 
our program FearnCoil on subfamilies of 2-stranded coiled coils. That isFone subfamily of 
2-stranded coiled coils is chosen to make up the base motifTand the class of all 2-stranded coiled 
coils is the target motif. Here we find that we have excellent performance; i.e.Twe are able to 
completely learn the coiled coil regions in our entire 2-stranded coiled coil database starting 
from a database consisting of coiled coils from any one subfamily. Based on our experimentsr 
such performance does not appear to be possible without the use of our iterative algorithm. In 
particularrthe best performance for the non-iterative approach ranges between 70 and 88%. 

Biological significance 

As a consequence of this workrwe have identified many new sequences that we believe con- 
tain coiled coils or coiled- coil- like structuresFsuch as the envelope proteins of mouse hepatitis 
virus and human rotavirus. One of our more striking findings is the existence of one and oc- 
casionally two coiled- coil- like regions in the envelope proteins of many retrovirusesrincluding 
Human Immunodeficiency Virus (HlV)rSimian Immunodeficiency Virus (SlV)rand Human T- 
cell Fymphotropic Virus (HTFV). Independent experimental investigations have also predicted 
these coiled-coil-like regions in HIV and SIV [19F56]. 
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4.2 Further background 

The coiled coil motif is found in fibrous proteinsrDNA binding proteinsFand in tRNA-synthetase 
proteins. Recently it has been proposed that the 3-stranded coiled coil motif acts as the cell 
fusion mechanism for many virusesFand algorithms for predicting these structures could aid in 
the study of how viruses invade cells. Computational methods [14F58] have already identified 
such coiled coil regions in influenza virus hemagluttinin and Moloney murine leukemia virus 
envelope protein; both of these predictions have been corroborated in the laboratory [30r40]. 

Coiled coils are a particular type of a-helixr consisting of two or more a-helices wrapped 
around each other with a slight left-handed superhelical twist. Coiled coils have a cyclic repeat 
of seven positionsFar&rcrcTer/rand g (see Figure 1). The seven positions are spread out 
along two turns of the helix. Coiled coils show a characteristic heptad repeat with hydrophobic 
residues found in positions a and cTand this repeat makes coiled coils particularly amenable to 
recognition by computational techniques. 

Computational methods have been quite successful for predicting coiled coils [63F58F42ri2r 
13ri4]. These techniques can be describedrbroadlyFas follows: 

1. Collect a database of known coiled coils and available amino acid subsequences. 

2. Determine whether the unknown sequence shares enough distinguishing features with the 
known coiled coils to be considered a coiled coil. 

Standard approaches [63F58] look at the frequencies of each amino acid residue in each of 
the seven repeated positions. Overall this singles method does pretty well. When the NewCoil 
program of Fupas et al. [58] is tested on the PDB (the database of all solved protein structures)r 
it finds all sequences which contain coiled coils. On the other handr2/3 of the sequences it 
predicts to contain coiled coils do not. That isrthe false positive rate for the standard method 
is quite high. 

These approaches based on the singles method build a table from the coiled coil database 
that represents the relative frequency of each amino acid in each position; that isrthere is a table 
entry for each amino acid/coiled coil position pairing. For examplerfor Feucine and position 
arthe entry in the table is the percentage of position a's in the coiled coil database which are 
Feucinerdivided by the percentage of residues in Genbank (a large protein sequence database) 
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(a) 



(b) 



Figure 4.1: (a) Top view of a single strand of a coiled coil. Each of the seven positions {a, b, c, d, e, /, g} 
corresponds to the location of an amino acid residue which makes up the coiled coil. The arrows between 
the seven positions indicate the relative locations of adjacent residues in an amino acid subsequence. 
The solid arrows are between positions in the top turn of the helix, and the dashed arrows are between 
positions in the next turn of the helix, (b) Side view of a 2-stranded coiled coil. The two coils are next 
to each other in space, with the a position of one next to the d position of another. The coils also slightly 
wrap around each other (not shown here). 



which are Leucine. For exampleLif the percentage of position a's in the coiled coil database 
which are Leucine is 27%Land the percentage of residues in Genbank which are Leucine is 9%L 
then the table entry value for the pair Leucine and position a is 3. IntuitivelyLthis table entry 
represents the "propensity" that Leucine is in position a in a coiled coil. 

The singles method approach [58] actually looks at 28— long windowsL since stable coiled 
coils are believed to be at least 28 residues long. Thus for each residueLit looks at each possible 
position (a through g)Land at all 28-long windows that contain it. It then calculates the relative 
frequencies for each residue in the window. If the product of the relative frequencies for each 
residue in some window is greater than some thresholdLit concludes that the residue is part of 
a coiled coil. 

Recently researchers have put this problem within a probabilistic framework [12L13L14]L 
and have given linear-time algorithms for predicting coiled coils by approximating dependencies 
between positions in the coiled coil using pairwise frequencies. This method for prediction uses 
estimates of probabilities for singles and pair positions. For exampleLin addition to estimating 
the probability that a Leucine appears in position a of a coiled coilL it also estimates the 
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probability that a Leucine appears in position a of a coiled coil with a Valine appearing in the 
following d position. For a given residue's contribution to the scorer the algorithm considers 
residues at the structurally relevant distances i = ITi = 2 and i = ^calculating the geometric 
mean of the three quantities P(k, k + i)/P(k + i)Twhere P(k, k + i) is the probability of finding 
residues k and k + i distance i apart in a coiled coiirand P(k + i) is the probability of finding 
residue k + i in a coiled coil. 

This method of predicting coiled coils has been very effective. When tested on the PDBrthe 
PairCoil algorithm based on this method selects out all sequences that contain coiled coilsr 
and rejects all the sequences that do not contain coiled coils. FurthermoreFwhen tested on a 
database of 2-stranded coiled coils (with a sequence removed from the database at the time it 
is scored)reach amino acid residue in a coiled coil region is correctly labeled as being part of a 
coiled coil. 

Since the PairCoil algorithm has better performance than the singles method algorithmr 
particularly with respect to the false-positive raterthis is the scoring method we build onFas 
well as the scoring method to which we compare our results. 

Other types of iterative approaches have been applied to sequence alignment and protein 
structure prediction by researchers [73F3F46F36]. Algorithmicallyr our approach differs from 
these approaches in two major ways. Fhe first is our use of randomness to incorporate sequences 
into our databaseFand the second is our use of weighting to update the database (see section 4.3). 
In additionr several of these papers are directed toward sequence alignmentT and sequence 
alignment is not so effective a tool for predicting coiled coilsFas the various subfamilies of coiled 
coils do not align well to each other. Alsor since the goal of these other methods is often to 
output potential matching alignment sr the testing of these algorithms is quite different. In 
particularr although some of these approaches use the "leave one out" criterionrto the best of 
our knowledgeFnone of them test performance with the "leave half out" criterion. 

Various machine learning techniques have been applied to the protein structure prediction 
problem. Fhe two main approaches are neural nets (e.g.r[47F67F59]) and hidden Markov 
models (e.g.r[53F9]). Both of these approaches require adequate data on the target motifF 
since there is a "training session" on sequences that are known to contain the target motif. 
Our approach differs from these methods since it does not require well analyzed data on the 
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Figure 4.2: Our basic learning algorithm. Initially, the algorithm starts off with a test set of examples 
and a set of initial parameters. In each iteration, the algorithm selects new examples, and re-estimates 
its parameters. 



target motif per se. Instead it uses already available data on a base motif and generalizes it 
to recognize the target motifTby running on a large number of sequences! 1 some of which are 
suspected to fold into the target motif. 

Other learning approaches which have been applied to protein structure prediction include 
rule-based methods (e.g.T[60]). 



4.3 The algorithm 

We first describe the general framework for our algorithm. NamelyFwe are initially given a set 
of parameters that help characterize our base conceptT&nd a set of test examples. Our goal is to 
decide which of these test examples are positive examples of some target concept. In additionl 1 
we know that the target concept is a generalization of the base concept. Our algorithm takes 
advantage of the fact that the base concept is somewhat related to the target concept. In 
particularronce the algorithm has identified some of the test examples that are presumed to be 
related to the base conceptTit can modify its database by "adding" these newly found examples. 
Examples are selected by a randomized procedure based on likelihoods. This process is then 
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iteratedras the added examples change the scores of other examples. (See figure 4.2.) 

We have implemented our learning algorithm for the protein motif recognition problem. In 
particularrour learning algorithm Learn Coil proceeds as follows. It is given two inputs: a 
database of a base motif which is related to the target motif we are interested inland a large 
database of iteration test sequences which is comprised of sequences that we believe contain the 
target motif as well as many other sequences of unknown structure. In practiceFwe generally 
include in the iteration test sequences some fraction of the PIR (a large protein sequence 
database)Tthe sequences from the PDB (the database of solved protein structures) that are 
known not to fold into the target motifTand sequences conjectured by biologists to fold into 
the target motif. 

Initiallyrthe algorithm estimates pair and singles amino acid residue probabilities for the 
motif's positions. Then the algorithm iterates four basic steps: 

1. The algorithm uses its estimates of the pair and singles probabilities to determine a 
likelihood functionFwhich maps residue scores to a likelihood of the residue belonging to 
the target motif. 

2. The algorithm scores each of the iteration test sequences using the estimated probabilities! 1 
and calculates the likelihoods for each of these sequences. 

3. The algorithm flips coins with probability proportional to the likelihood of each score to 
determine which parts (if any) of each sequence are presumed to be part of the target 
motif. The residues which are thus determined to be presumed examples of the target 
motif make up the new database for the next iteration. 

4. The algorithm uses the base motif database and the new database just determined in this 
iteration to update its estimates of the singles and pair probabilities for the target motif 
using a Bayesian-like weighting scheme (see section 4.3.4). 

The algorithm continues iterating until the new database stabilizes. 

We now describe each of the components of the algorithm in more detaiirusing coiled coils 
as an exampleFalthough the algorithm can be applied to other protein motifs. 
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4.3.1 Scoring 

In our implement ationr we use the PairCoil program described by Berger et al. [14] as our 
scoring procedurer although any good prediction algorithm with a low false positive rate can 
be used for scoring. This scoring method uses correlation methods that incorporate pairwise 
dependencies between amino acids at multiple distances. The scoring procedure gives a residue 
score for each amino acid in a given sequencers well as a sequence score! 'which is the maximum 
residue score in the sequence. 

In order to use this scoring procedurerwe must have estimates for the probabilities for the 
singles and pair positions for the motif. Initiallyrwe have estimates for the probabilities based 
on the database of sequences of the base motifTand after each iteration of the algorithmrwe 
use updated probabilities. In each iteration after the firstTwhen we score a sequence we check 
to see if it was identified in the previous iteration. If it wasIVe remove this sequence from the 
database and adjust the probabilities before scoring. 

Given good estimates for the probabilities for the singles and pair positions for the motiiT 
and reasonable assumptions about dependencies in the motiiT the PairCoil scoring method 
which we use as a subroutine is mathematically justified [12]. 

4.3.2 Computing likelihoods 

Once we have a sequence scorePwe assess it by converting it into a likelihood that the sequence 
contains the target motif. In each iteration of the algorithmrwe compute a function that takes 
a residue score and computes the likelihood that the residue is part of the target motif. 

We compute this likelihood function in a manner described in [14]. In particularly every 
sequence in a large sequence database is scored. (Ideallyrthis large sequence database is the 
PIR. Howeverlm practicerto save timeFwe use a sampled version of the PIKT which is 1/25-th 
the size; the likelihood function calculated using this sampled PIR is a good approximation 
to the likelihood function calculated using the entire PIR.) The sampled PIR residue score 
histograms are nearly Gaussian distributed with some extra probability mass added on the 
right-hand tail. This extra mass is attributed to residues in the target motiiT since they are 
expected to score higher. In the case of the coiled coil motifTgiven the biological data currently 
availablerit is estimated that between 1/50 and 1/30 of residues in the PIR are in a coiled coil. 
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To fit a Gaussian to the histogram datalVe calculate the mean so that the extra probability 
mass on the right side of the mean corresponds to between 1/50 and 1/30 of the total mass of 
the PIR. We then compute the standard deviation using only scores below that meanFwhere 
a Gaussian better fits the histogram data. The likelihood that a residue with a given score is 
a coiled coil is estimated as the ratio of the extra histogram mass above the Gaussian at that 
score (corresponding to data assumed to be coiled) to the total histogram mass at that score. 
A least square fit line is then used to approximate the likelihood function in the linear region 
from 10 to 90 percent. This line then gives an approximation for the likelihoods corresponding 
to all scores. 

One feature of this method of computing likelihoods is that it does not allow too many 
residues to be considered as part of coiled coils. This helps keep the false positive rate of the 
algorithm low. 

4.3.3 Randomized selection of the new database 

Once we have obtained the likelihood function for an iterationFwe wish to use the likelihoods 
to build a new database of sequences presumed to fold into the target motif. At the beginning 
of each iterationFour new database contains no sequences. Then for each sequence in the set of 
test sequencesFwe do the following. FirstTwe score each sequence and then convert its sequence 
score to a likelihood. NextTwe draw a number uniformly at random from the interval [0, 1]. If 
the number drawn is less than or equal to the likelihood of the sequencerthen the sequence is 
added to the new database. All residues in this sequence that have scores equal to the sequence 
score or greater than the 50% likelihood score (which is the algorithm's cutoff for a residue 
being in a coiled coil) are added to the database. Once we have processed every sequence in our 
test setTthen we have our new database of sequences presumed to fold into the target motif. 

In practicer we find that adding randomness substantially improves the performance of 
our algorithm. In factTif the procedure is written just to accept sequences that have greater 
than 50% likelihoodrthen the algorithm fails to recognize many sequences which are known to 
contain 3-stranded coiled coils. On the other handrif the procedure lowers the threshold value 
for acceptancerthen its false positive rate increases. 
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4.3.4 Updating parameters 

Once we have a new database of sequences which are thought to contain the target motilTwe 
need to update the parameters used by the algorithm for scoring. In our caseHn each iteration 
of the algorithmr the scoring procedure needs updates of the estimates of probabilities for 
singles and pair positions. The most straightforward way to update the probabilities is to use 
a maximum likelihood estimate from frequency counts from the new database. Howeverrthis 
does not work that well in practice. Insteadrwe update each probability by taking a weighted 
average of the probability given by the base motif database and the probability given by the 
new database. 

We now describe a theoretical framework for updating probabilities in this manner in each 
iteration of our algorithm. The approach we give is motivated by a Bayesian viewpoint [45IT5]. 
In particularly we think of the probabilities we are trying to estimate as the parameters of a 
Multinomial distributionrand we use the Dirichlet density to model the prior information we 
have about these probabilities. In factTthe approach we give is not completely BayesianFas we 
will use the seen data to pick the parameters of the prior distribution; this is sometimes called 
a B ayes /Non-B ayes compromise [45]. 

We will use frequency counts from our databases to estimate singles and pair probabilities. 
For simplicityFwe focus on the case of updating singles probabilities; updating pair probabilities 
is analogous. 

InitiallyFwe have a database of sequences which fold into a particular base motif. Thusrfor 
each position in the motifTwe have a 20-long count vector rone for each of the 20 amino acids. 
For examplerfor a given database of known coiled coilsrfor position «rwe know how many 
times each amino acid appears. In additionr after each iteration of the algorithmr we have a 
new database of sequences that we have selected and which we presume fold into the target 
motif. This new database also gives us a 20-long count vector for each position in the motif. 

We update the probabilities using these frequency count vectors. In particularr we fix a 
numbering of the amino acids from 1 to 20. Then for each position q in the motif (for coiled 
coilsFg G {a, 6, c, d, e, f,g})Twe have a count vector x^ = (x\ 9 \ x 2 9 \ . . . , x 2 9 )rwhere x\ q ' is the 
number of times amino acid i appears in position q of the motif in the base motif database. In 
additionFwe have a count vector jft q ) = (2/1,2/2, • • • , 2/20 )T where yf' is the number of times 
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amino acid i appears in position q of the motif in the new database (i.e.Tthe database consisting 
of the sequences we have picked in this iteration of the algorithm). 

Let jf^ = (pi ,P2 , • • -?P2o ) be the actual probabilities for the amino acids appearing in 
position q of the motif. We assumerfor simplicityr that the count vectors for each position 
are independent of each other. ThusFwe focus on updating the probabilities of one position 
independent of the other positions. For notational convenienceFwe fix a position and drop the 
superscript q. We assume that for a fixed positionrthe count vector is generated at random 
according to the Multinomial distribution with parameter p = (pi,p 2 , ■ ■ ■ ,P2o)- Fhe parameters 
Pi,P2, ■ ■ ■ , P20 are the "true" probabilities of seeing the amino acids in the fixed position in the 
motif we are interested in. Fhese are the parameters we wish to estimate. 

In our caseFwe have very strong a priori knowledge about the probabilities. Since we are 
trying to learn a particular target structural motif from a related base structural motifTwe can 
use the probabilities estimated from the base motif as prior probabilities. In factTbecause these 
structural motifs are relatedrwe expect the updated probabilities for the target motif to be 
similar to the original probabilities for the base motif. 

We model our a priori beliefs by the Dirichlet density. The value of a Dirichlet density 
T>(a) (with parameter a = (ai, a 2 , . . . , a^rwhere a 8 - > and a = Y^ a i) at a particular point 
x = (xi, x 2 , . . ., £;%)rwhere J2 x i = 1 i s given by: 

jy-i-\ r(«o) rr («,-i) 

f(x\a)= \[x\ '. 



n,-=i r («i 



The gamma function T(a) is: 



T(a) = / e~ x x a - l dx. 
Jo 

The mean of Dirichlet density is (oti/a , a 2 /a , . . . , a k /a )T&nd the larger a isrthe smaller 
the variance is. 

Thus a Bayesian estimate for the probabilities Pi,p 2 , ■ ■ -,P2o can be found by looking at 
the posterior distribution. The Dirichlet distribution is conjugate for the Multinomiairand the 
posterior distribution is the Dirichlet distribution V(a-\-y) [15F45]. That isrthe new parameter 
of the distribution is the vector sum of the original parameters and the observed data. ThusFa 
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Bayesian estimate for probability p, after seeing the data y is 

on 



«o + Vo 



, where y = ^^. 



We still have not addressed the issue of how the parameters of the prior distribution are 
chosen. We depart from the traditional Bayesian approachrand choose the parameters of the 
prior distribution after seeing the data. In particularFsince the base motif and the target motif 
are relatedrwe want the base motif database to have a strong effect on the estimates for our 
probabilitiesrand thus we choose the variance of the prior distribution accordingly. 

The mean of the Dirichlet density is specified by the estimated probabilities of the base 
motif. The variance of the density is picked as follows. If < A < 1 is the effectTor weightT 
that we want the base motif database to haverthen we let a, = X{ ■ y^-^T where x = J2 i=1 X{ 
and y = J2i=iVi- (ActuaUyFwe have to be careful in the case where x, = 0.) It is easy to 
verify that our estimate for the probability p, is given by A — + (1 — A)^-. NamelyFour updated 
probability is a weighted average of the probability given by the base motif database and the 
probability given by the new database. 

In practicerwe have found that our method of updating probabilities has worked well. It 
is superior to a maximum likelihood approach which uses just the current iteration's frequency 
counts. These estimates of the probabilities are especially problematic in the zero frequency 
case. Our method also performs better than an unweighted approach using both the initial 
frequency counts and the current iteration's frequency counts. These estimates of the proba- 
bilities are largely dependent on the size of the original databaseFand the number of residues 
that are presumed at each iteration to be part of the target motif. In our test domain of coiled 
coilsFwe found that this method of updating probabilities missed more sequences that contain 
coiled coils than did our method for updating probabilities. 

Using Dirichlet mixture densities as priors to estimate amino acid probabilities has been 
studied by Brown et al. [29]. Their approach uses as a prior the maximum likelihood estimate 
of a mixture Dirichlet densityT based on data previously obtained from multiple alignments 
of various sets of sequences. Their approach is a pure Bayesian approachrand their prior 
distribution has a smaller effect on the final probability estimates. 
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4.3.5 Algorithm termination 

The iteration process terminates when it stabilizes; that isFwhen the number of residues added 
from the previous iteration changes by less than 5%. Usually the procedure converges in around 
six iterations; otherwiseFwe terminate it after 15 iterations. In practiceFwe found that the 
algorithm rarely had to to be terminated due to lack of convergence. 

In our implementationrthe running time of the entire algorithm is linear in the total number 
of residues in all sequences which are given as input. The basic operation in each iteration is 
scoring every sequence using the PairCoil algorithm. For each sequencerthe PairCoil scoring 
program takes time linear in the number of residues. Since we have at most a fixed number of 
iterationsrthe entire algorithm is linear-time. 

After running FEARNCoilTthe "learned" target concept contains both 2- and 3-stranded 
coiled coils. The problem of distinguishing one set from the other remains. The MultiCoil 
program of WolfTKimFand Berger [unpublished results] is being developed for this purpose and 
in initial experiments performs well. 

4.4 Results 

We have implemented our algorithm in a C program called FearnCoil. We test our program on 
the domain of 3-stranded coiled coils and subclasses of 2-stranded coiled coils. First we describe 
the databases we use to test the programFand then we follow by describing the program's 
performance. 

4.4.1 The databases and test sequences 

Our original database of 2-stranded coiled coils consists of 58,217 amino acid residues which 
were gathered from sequences of myosinrtropomyosinFand intermediate filament proteins [14]. 
We also have separate databases containing sequences from each of these protein subclasses 
individually. A synthetic peptide of tropomyosin is the only solved structure among these. 

We test FearnCoil on the 3-stranded coiled coils by starting the algorithm with the base 
database of all 2-stranded coiled coils. We test FearnCoil on the 2-stranded coiled coils by 
starting the algorithm with a base database of one of the subfamilies of the 2-stranded coiled 
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coils. 

The set of iteration test sequences for testing performance on 3-stranded coiled coils consists 
of the following 5516 sequences: 286 known non-coiled coils from the non-redundant version 
of the PDB created in [14] (the PDB is the database of solved protein structures); 2% of the 
sequences in OWL (a large non-redundant composite databaser where no two sequences in 
the database are exactly the same and no two sequences show only "trivial" differences [20] )L 
with any obvious members of the PDB removed (2815 total); sequences in OWL whose names 
contain the strings actininL alpha spectrinL dystrophinL tail fiberL lamininL fibrinogenL envL 
spikeLglycoproteinLbacteriophage T4 wacLbacteriophage K3 fibritinLheat shock transcriptionL 
or macrophage scavenger receptorLas well as the 3-stranded coiled coil mutant for GCN4 (2415 
totalLof which many are thought to contain 3-stranded coiled coilsLand the 46 sequences given 
below are known to contain them). 

The 3-stranded coiled coil set is comprised primarily of laminin and fibrinogen sequencesL 
as well as influenza virus hemagluttininL Moloney murine leukemia envelope proteinL 2 heat 
shock transcription factorsLbacteriophage T4 and K3 wac proteinsLthe trimeric GCN4 mutantL 
2 macrophage scavenger receptorsLand bacteriophage T3 and T7 tail fibers. 

Our set of iteration test sequences for 2-stranded coiled coils includes: 1/23 of the PIR 
(1553 total); the 286 known non-coiled coils; and the two of the subfamilies out of myosinsL 
tropomyosinsLand intermediate filaments. (For exampleLwhen we start with a database of 
intermediate filamentsLour iteration test sequences include myosins and tropomyosins.) 

Note that most of the sequences in our 2- and 3-stranded coiled coil data sets do not have 
solved structures. HoweverLthere is strong experimental support that they contain coiled coilsL 
although often the boundaries of the coiled coil regions are difficult to specify exactly. We do 
not know the three dimensional structure for most of the protein sequences in our iteration test 
sets (except for the sequences from the PDB and portions of the sequences making up the 2- 
and 3-stranded coiled coil data sets). 

4.4.2 Learning 3-stranded coiled coils 

Our techniques improve non-learning based approachesLsuch as PairCoil [14]L which often 
fails to identify 3-stranded coiled coil regions. 
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Base Set 


Evaluation 
Set 


Performance 
without Learn Coil 


Performance 
with LearnCoil 


% of seqs 


# of false 
positive seqs 


% of seqs 


# of false 
positive seqs 


2-str CCs 


46 3-str CCs 


67% 


0/286 


93% 


0/286 



Table 4.1: Learning 3-stranded coiled coils from 2-stranded coiled coils using the leave one 
out criterion. 



We test the algorithm on 3-stranded coiled coils in two ways: the "leave one out" test and 
the "leave half out" test. In both casesLLEARNCoiL improves recognition of 3-stranded coiled 
coils starting with an initial database of 2-stranded coiled coils. We measure LearnCoil's 
performance on the 286 non-coiled coil proteinsLand an evaluation set consisting of 3-stranded 
coiled coil sequences. We assume that a false negative prediction has occurred when a sequence 
in the 3-stranded coiled coil evaluation set receives a score with a corresponding likelihood less 
than 50%. We assume a false positive has occurred when a non-coiled coil protein scores at least 
50% likelihood. Since our algorithm is randomizedLthe final likelihoods are found by averaging 
LearnCoil outputs over five runs. 

In the first "leave one out" scenarioLthe algorithm is run with all the 5516 iteration test 
sequences described in section 4.4.1. Once the algorithm terminatesLeach of the 46 sequences 
in the 3-stranded coiled coil set is scored with respect to parameters calculated from the new 
database in the final iteration minus the effects of this sequence. That isL since the 46 3- 
stranded coiled coil sequences are included in the iteration test setLif a sequence appears in the 
final databaseLbefore scoring this sequenceLthe sequence is removed to avoid the possibility of 
unfairly biasing the test. 

The weight of the original database (i.e.Lrelative to the new database) was chosen empirically 
to be A = 0.1. This makes sense because 2- and 3-stranded coiled coils are sufficiently different; 
thusLit may require much more weight for the newly identified sequences to effectively broaden 
the new database to contain 3-stranded coiled coils. We also experimented with weights in the 
range < A < 0.5 but A = 0.1 gave the best results. 

Our algorithm LearnCoil positively identifies 43 out of 46 (93%) of the 3-stranded coiled 
coil sequences and makes no false positive predictions. In contrastLPAlRCoiL positively identi- 
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fies 31 out of 46 (67%) of the 3-stranded coiled coils and also makes no false positive predictions 
(see Table 4.1). MoreoveiTusing the final databases that LearnCoil producedrwe are able 
to recognize all the sequences in the 2-stranded coiled coil database. Thus the final databases 
produced by the LearnCoil algorithm performs well on both 2- and 3-stranded coiled coils. 

In the second "leave half out" scenarioFwe split the 3-stranded coiled coil sequence set 
in half in the following manner. FirstTthe 46 3-stranded coiled coil sequences are divided 
into the following subgroups: a-fibrinogensr/3-fibrinogensF7-fibrinogensriamininsrtail fibersr 
heat shocksrand all remaining protein sequences. NextTeach of these subgroups is randomly 
divided into two partsFone for each half; this ensures that in the final splitTeach half is fairly 
representative of examples of the 3-stranded coiled coil motif. 

We split the 3-stranded coiled coil sequences 3 times in the above manner. This then gives 
us six different iteration and evaluation sets. Each evaluation set consists of 23 3-stranded 
coiled coil sequencesFand the corresponding iteration test set consists of 5493 sequences (the 
original 5516 sequencesr minus the 23 sequences in the evaluation set). We run LearnCoil 
on each of the six iteration test setsFand evaluate the algorithm by its performance on the 
corresponding evaluation sets (namelyr those 3-stranded coiled coil sequences which are not 
included in the iteration test set). Note that the set of sequences with solved structures that 
do not contain coiled coils are included in all iteration test setsFand are scored using the leave 
one out criterion. 

For each iteration test setTour algorithm is again run 5 times with A = .lTand with final 
likelihoods averaged over the runs. Table 4.2 gives the performance of our algorithm on the 
different evaluation sets. On averageFLEARNCoiL selects out 85% of the 3-stranded coiled coil 
sequences not originally in the set of sequences upon which it iterates. In contrastTPAlRCoiL 
on average selects out 67% on the same sets of sequences. In all but one of the six experimentsr 
the algorithm does not get any false positives from the set of solved structures. In the one 
scenario when it does get a false positiverthe likelihood of all sequences in the corresponding 
evaluation set (Bl) that score above 50% also score higher than this false positive. 

The average performance of LearnCoil on the 3-stranded coiled coil sequences included 
in the iteration test set is 88%. (Individual performance data for each of the six experiments is 
not shown.) This average does not seem to be significantly higher than the algorithm's average 
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Base Set 


Evaluation 
Set 


Performance 
without Learn Coil 


Performance 
with LearnCoil 


% of seqs 


# of false 
positive seqs 


% of seqs 


# of false 
positive seqs 


2-str CCs 


Set Air23 3-str CCs 


65% 


0/286 


87% 


0/286 


2-str CCs 


Set A2r23 3-str CCs 


70% 


0/286 


83% 


0/286 


2-str CCs 


Set Bir23 3-str CCs 


74% 


0/286 


87% 


1/286 


2-str CCs 


Set B2r23 3-str CCs 


61% 


0/286 


78% 


0/286 


2-str CCs 


Set Cir23 3-str CCs 


70% 


0/286 


96% 


0/286 


2-str CCs 


Set C2r23 3-str CCs 


65% 


0/286 


78% 


0/286 



Table 4.2: Learning 3-stranded coiled coils from 2-stranded coiled coils using the leave half out 
criterion. The 3-stranded coiled coil sequences are split 3 timesLgiving us six different iteration 
and evaluation sets. The evaluation sets are A1LA2LB1LB2LC1 and C2 (Al and A2 are a 
result of one splitLetc). 

performance on the sequences in the evaluation set. Thus in comparing the results in Table 4.2 
with the results in Table 4.1Lit appears that the decreased performance on these runs with 
the splits is the result of fewer available 3-stranded coiled coil sequences to the algorithmLand 
not upon whether the evaluation criterion is the leave one out criterion or the leave half out 
criterion. 



4.4.3 Learning subclasses of 2-stranded coiled coils 

Our results on subclasses of the 2-stranded coiled coil motif indicate that we are able to "learn" 
coiled coil regions in one family of proteins using a database consisting of coiled coils from 
another family of proteins. Lor exampleL we are able to learn coiled coils in intermediate 
filaments from a database of coiled coils in either myosins or tropomyosins. Our techniques 
improve non-learning based approachesL such as the PairCoil program [14]L which fail to 
identify conjectured coiled coil residue positions. 

We tested LearnCoil on three different domains (Table 4.3): tropomyosins (TROPs) as a 
base set and myosins (MYOs) and intermediate filaments (IPs) as an evaluation set; myosins 
as a base set and tropomyosins and ILs as an evaluation set; ILs as a base set and myosins 
and tropomyosins as an evaluation set. A different set of iteration test sequences was used for 
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Base Set 


Evaluation 
Set 


Performance 
without LEARNCOIL 


Performance 
with LEARNCOIL 


%of 
residues 


$ of false 
positive seqs 


%of 
residues 


$ of false 
positive seqs 


TROPs 


MYOs + IFs 


71% 


4/286 


99% 


1/286 


MYOs 


TROPs + IFs 


89% 


2/286 


99% 


1/286 


IFs 


MYOs + TROPs 


83% 


4/286 


99% 


2/286 



Table 4.3: Learning 2-stranded coiled coils from a restricted set 



each of these tests; that isrthe set that includes sequences of the two protein families in the 
evaluation set. For these experiment si 1 we have residue dataLand thus our performance measure 
is with respect to these. False negatives are residues of sequences in the evaluation set which 
do not have at least a 50% likelihood. False positives are defined as in section 4.4.2 

Here the weight of the original database was empirically chosen to be A = 0.3. One possible 
explanation for this is since the subclasses of 2-stranded coiled coils has more similarities than 
differences! 1 the program does not have to be so aggressive in picking up the evaluation set. 
MoreoverLthe goal is a target set of 2-stranded coiled coilsFand this is best achieved by weighting 
each of the 3 types of proteins equally. We also experimented with weights of A = 0.1 and 
A = 0.5rand while their overall performance was similarrthey produced more false positives. 

FirstLwe consider experiments with tropomyosins in the base set and myosins and IFs in 
the evaluation set. Learn Coil positively identifies 99% of the myosin and IF residues in 
the 2-stranded database and makes one false positive prediction. This is in contrast to Pair- 
CoilTwhich obtained a performance of 70.9%Lwith four false positive and two false negative 
predictions. 

Next we consider experiments with a base set of myosins and an evaluation set of tropo- 
myosins and IFs. Learn Coil positively identifies 99% of the tropomyosin and IF residues 
and makes one false positive prediction. This is in contrast to PairCoilL which obtained a 
performance of 88.8%Lwith two false positive and one false negative predictions. 

LastlyLwe consider experiments with a base set of IFs and an evaluation set of tropomyosins 
and myosins. Learn Coil positively identifies 99.4% of the tropomyosin and IF residues and 
makes two false positive predictions. One possible explanation for more false positives here is 
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that the IFs have a less obvious coiled-coil structure and there very well may be non-coiled coil 
residues in the database; consequentlyr starting with a table of solely IFs may select out non- 
coiled coils for the target database. In contrastTPAlRCoiL obtained a performance of 83.3%T 
with four false positive predictions. 

For all three above experimentsr Learn Coil improved performance of PairCoil in iden- 
tifying coiled coil residuesFwhile also improving its false positive rate. 

We also tested Learn Coil with the NewCoils program [58] used as the underlying scoring 
algorithm. For subclasses of 2-stranded coiled coilsFwe found that Learn Coil enhanced the 
performance of NewCoils as well. It obtained a performance of 96.2% when tropomyosins 
were used as the base setTa performance of 95.3% when myosins were usedrand a performance 
of 98.2% when IFs were used. The program did not make any false positive predictions when 
run on these three test domains. In contrastTthe non-learning based version of NewCoils had 
substantial overlap between the residue scores for coiled coils and non-coiled coils in all of the 
three test domains. 

4.4.4 New coiled-coil-like candidates 

The Learn Coil program has identified many new sequences that we believe contain coiled- 
coil-like structures. Table 4.4 lists some examples of "newly found" viral proteins (i.eTproteins 
for which PairCoil indicates that no coiled coil is presentTbut Learn Coil indicates a coiled- 
coil-like structure is present). We believe that the proteins given in Table 4.4 either contain 
coiled coils or coiled-coil-like structures. For exampler recent biological work has identified a 
coiled-coil-like structure which is believed to consist of a paralleirtrimeric coiled coil encircled 
by three helices packed in an antiparallel formation; this structure is thought to be in the 
envelope glycoproteins of both HIV and SIV (Simian Immunodeficiency Virus) [19F56]. 

Our program seems to be able to accurately predict this new coiled-coil-like structure. For 
examplerit identifies two coiled-coil-like regions in the envelope protein of SIV. Independentlyr 
the biological investigation of SIV by Blacklow et al. predicts that these are the two regions 
that are part of the coiled-coil-like structure [19]. One of these regions (comprising the outer 
three helices) is predicted by the NewCoil program and is given a 26% likelihood by the 
PairCoil program. The other region (comprising the trimeric coiled coil) is only predicted by 
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our Learn Coil program. This region corresponds to the N-terminal fragment in the paper 
of Blacklow et al. In factLthe region Learn Coil predicts and the region that Blacklow et al. 
find are almost identical: Learn Coil predicts a coiled- coil- like structure starting at residue 
553 and ending at residue 601Lwhereas Blacklow et al. start the region at residue 552 and end 
it at residue 604. 



PIR Name 


LearnCoil 


PairCoil 




Likelihood 


Likelihood 


mouse hepatitis virus E2 glycoprotein precursor 


>90% 


23% 


human rotavirus A glycoprotein NCVP5 


>90% 


<10% 


human respiratory syncytial virus fusion glycoprotein 


>90% 


<10% 


human T-cell surface glycoprotein CD4 precursor 


77% 


<10% 


human T-cell lymphotropic virus - type I, env 


>90% 


<10% 


equine infectious anemia virus, env 


>90% 


<10% 


fruit fly 14-3-3 protein 


52% 


<10% 


HIV, env 


>90% 


<10% 


SIV, env 


>90% 


26% 



Table 4.4: Newly discovered coiled- coil- like candidates 



MoreoverLthere is biological evidence that several other of the sequences in Table 4.4 contain 
coiled- coil- like structures. Our predictions were made independently of these results. RecentlyL 
the crystal structure of two 14-3-3 proteins have been solved [55L75]. The paper of Liu et al. 
studies the zeta transform of the 14-3-3 structure in E. coliLand they report a 2-stranded anti- 
parallel coiled coil structure. On the other handLthe paper of Xiao et al. studies the human 
T-cell t dimerLand they report helical bundles. Although there is some uncertainty hereLit 
is likely that the 14-3-3 protein we have identified contains a coiled- coil- like structureLif not a 
coiled coil itself. The Human T-cell lymphotropic virus and equine infectious anemia virus are 
closely related to HIVLand thus their envelope proteins are also likely to contain coiled- coil- like 
structures. 

The proteins reported in Table 4.4 are compared to the PairCoil program. The NewCoil 
program of Lupas et al. finds some of these proteins; howeverLin generalLthis program finds 
a significant number of false positives. The 14-3-3 proteinLthe human T-cell lymphotropic 
virus envelope protein and the human T-cell surface glycoprotein CD4 precursor are found only 
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using our Learn Coil program. As mentioned aboverthere is some biological evidence that at 
least two of these proteins (the 14-3-3 protein and human T-cell lymphotropic virus envelope 
protein) contain coiled- coil- like structures. 

We anticipate that the identification of likely coiled- coil- like regions in important protein 
sequences (such as those in Table 4.4) will facilitate and expedite the study of protein structure 
by biologists. In additionr since our program Learn Coil is able to identify the new coiled- 
coil-like motif in HIV and SIVTit is possible that our program will help aid in the discovery of 
this structure in other retroviruses. 

4.5 Conclusions 

In this chapterL we have shown that a learning-based algorithm that uses randomness and 
statistical techniques can substantially enhance existing methods for protein motif recognition. 
We have designed a program Learn Coil and demonstrated its ability to "learn" the 2-stranded 
and 3-stranded coiled coil motif. It has identified new sequences that we believe contain coiled- 
coil-like structures. It is our hope that biologists will use this program to help identify other 
new coiled-coil-like structures. 

There is evidence that our program may have identified a new coiled-coil-like motif that 
occurs in retrovirusesL and future work involves studying retroviruses and this motif more 
closely. 

In the future we plan to apply the Learn Coil program to motifs other than those that 
have coiled-coil-like properties. Limited data is a problem for many protein structure prediction 
problems. There are newly discovered protein motifs for which biologists cannot yet predictLand 
more importantlyLdo not yet even know the structural features that characterize the motifs. We 
hope to extend the techniques developed here to aid in the determination of crucial structural 
features that give rise to these motifsLas well as to learn how to predict which proteins exhibit 
this motif. 



Chapter 5 



Concluding remarks 



In this thesisrwe have studied three problems in machine learning. In the first part of the thesisT 
we examined Valiant's PAC modeirand considered learnability in this model. In particular Twe 
studied concept classes of functions on k termsrand gave an algorithm for learning any function 
on k terms by general DNF. On the other handrwe showed that if the learner is restricted so 
that it must output a hypothesis which is a member of the concept class being learnedrthen 
learning the concept class of any symmetric function on k terms is NP-hard (except for the 
concept classes of ANDTNOT ANDTTRUE and FALSE). Our results completely characterize 
the learnability of concept classes of symmetric functions on k terms. We leave as an open 
problem whether concept classes for more general functions on k terms can be learned when 
the learner's output hypothesis is restricted. 

The second part of the thesis introduced the problem of piecemeal learning an unknown 
environment. For environments that can be modeled as grid graphs with rectangular obstaclesT 
we gave two piecemeal learning algorithms in which the robot traverses a linear number of 
edges. For more general environments that can be modeled as arbitrary undirected graphsTwe 
gave a nearly linear algorithm. An interesting open problem is whether there exists a linear 
algorithm for piecemeal learning arbitrary undirected graphs. Piecemeal learning takes into 
account just one of the limitations on a robot's resources. It would be interesting to come up 
with models and algorithms to handle other practical limitations of a robotTsuch as incorrect 
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data that a robot may receive (due to noisy sensors) and difficulties a robot may have in motor 
control. Other extensions of the work might include the scenario of multiple robotsFor multiple 
"refueling stations." 

In the last part of the thesisFwe applied machine learning techniques to the problem of 
protein folding prediction. We gave an iterative learning algorithm that is particularly effective 
for folds for which there is not much currently available data. We implemented our algorithmr 
and showed its effectiveness on the 3-stranded coiled coil motif. There are other motifs for 
which there is a lack of dataFsuch as /3-rolls and /3-helicesFand it would interesting to extend 
our techniques to work on these motifs. In additionr there is evidence that our program may 
have identified a new coiled-coil-like motif that occurs in retrovirusesFand future work involves 
studying this motif more closely. 
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